Figure 3 - uploaded by Gilles Caporossi
Content may be subject to copyright.
Example of a complete set (left), a working set (right) with a reversed optimal hyperplane. 

Example of a complete set (left), a working set (right) with a reversed optimal hyperplane. 

Source publication
Article
Full-text available
Abstract L1 norm discrimination consists in finding the hyperplane that minimizes the sum of L1 norm distances between the hyperplane and the points that lie on the wrong side of the hyperplane. This problem is difficult for datasets containing more than 100,000 points. Since few points are needed to obtain the optimal hyperplane, we propose a poin...

Context in source publication

Context 1
... the optimal hyperplane h ∗ is found. To describe the algorithm, we define S as the working set, i.e. the set of points that is used to construct the current linear program. The omitted points are noted S . Solving the problem with S yields a separating hyperplane h ( w, γ ). We note H the set of points that are misclassified or on h ( w, γ ). The set E of omitted misclassified points is defined by E = H ∪ S . The algorithm is described in Figure 2. As no point is ever removed from S , the convergence proof of the algorithm is obvious. The rate of convergence however, depends on the choice of the points to be in the initial set S . The convergence property is not affected by the choice of the points added as long as some points are added to S while E is not empty. It is therefore theoretically not necessary to add all misclassified points and one may also add points that are correctly classified. However, our experiments using various strategies at Step 4 lead us to the simple rule described above. To prove the optimality of a solution h ∗ ( w ∗ , γ ∗ ), it is necessary that all points not satisfying the condition O for this solution are in the working set. In the support vector machine terminology, this set of points (vectors) is dubbed as support vectors. To reduce the number of iterations (and the computing time), a good initial working set should at least contain all the support vectors for the optimal hyperplane h ∗ ( w ∗ , γ ∗ ). For the MSD problem, points that do not respect condition O are associated with active constraints. As 2 n linea programs must be considered simultaneously, even if in linear programming constraints that are inactive at the optimal solution can be removed from the problem without changing the solution, some points that respect the condition O for the optimal hyperplane must be considered. In fact, if we were to include only the misclassified points and the points that lie on the surface of the optimal hyperplane, we would obtain what we call a ”reversed” hyperplane (see Figure 3) with optimal value 0 (for the restricted set of points). If only points that do not respect the condition O were used as initial working set, almost if not all the omitted points would be added at the ...

Similar publications

Article
Full-text available
We propose to measure the $$\tau ^-\rightarrow K_1^-\nu _\tau \rightarrow (K^-\omega ) \nu _\tau \rightarrow (K^- \pi ^+\pi ^-\pi ^0)\nu _\tau \ $$ τ - → K 1 - ν τ → ( K - ω ) ν τ → ( K - π + π - π 0 ) ν τ decay in order to determine the $$K_1$$ K 1 axial vector mixing angle $$\theta _{K_1}$$ θ K 1 . We derive, for the first time, the differential...
Article
Full-text available
All existing theories assume that the basic affective distinction of a stimulus is its positivity/negativity value. The implicit assumption of these models is that negativity and threat are functionally equivalent qualities of a stimulus. Our study challenges this assumption. Because efficient recognition of threat promotes survival, we hypothesize...
Article
Full-text available
In the domain of sequence recommendation, contextual information has been shown to effectively improve the accuracy of predicting the user’s next interaction. However, existing studies do not consider the dependencies between contextual information and item sequences, but the contextual information is directly fusing with the item sequences, which...
Preprint
Full-text available
We propose to measure the $\tau^-\to K_1^-\nu_\tau \to (K^-\omega) \nu_\tau\to (K^- \pi^+\pi^-\pi^0)\nu_\tau $ decay in order to determine the $K_1$ axial vector mixing angle $\theta_{K_1}$. We derive, for the first time, the differential decay rate formula for this decay mode. Using the obtained result, we perform a sensitivity study for the Belle...