IMPROVEMENT OF ACCURACY OF PARAMETRIC CLASSIFICATION IN THE SPACE OF N×2 FACTORS-ATTRIBUTES ON THE BASIS OF PRELIMINARY OBTAINED LINEAR DISCRIMINANT FUNCTION

A procedure for classifying objects in the space of N×2 factors-attributes that are incorrectly classified as a result of con - structing a linear discriminant function is proposed. The classification accuracy is defined as the proportion of correctly classified objects that are incorrectly classified at the first stage of constructing a linear discriminant function. It is shown that, for improperly classified objects, the transition from use as the factors-attributes of their initial values to the use of the centers of gravity (COGs) of local clusters provides the possibility of improving the classification accuracy by 14 %. The procedure for constructing local clusters and the principle of forming a classifying rule are proposed, the latter being based on converting the equation of the dividing line to the normal form and determining the sign of the deviation magnitude of the COGs of local clusters from the dividing line.


Introduction
The purpose of applying image recognition methods is to obtain classification rules that ensure the maximum accuracy of classification.As a rule, it is estimated by the number of incorrectly classified objects in relation to the total number of classified objects.A wide range of applications of these methods shows that one can't do without them when solving problems from completely different subject areas.The methods are used for image recognition [1][2][3][4][5], medical diagnostics [6,7], technical diagnostics [8][9][10][11][12].Principal differences in approaches for solving specific problems are related to the choice of methods and the feasibility of the chosen alternative.Nevertheless, when choosing any of the alternatives, the problem is reduced to obtaining a decisive rule allowing to assign the object to one of the classes.The probable error in the classification can lead to negative consequences: -an error in the identification of the criminal leads to a newly committed crime because of the untimely detention of the criminal or the punishment of the innocent; -an error in the medical diagnosis leads to an incorrect strategy of treatment and progression of the disease and severe complications; -an error in identifying the fault and its causes in the technical devices leads to their failure.This list can be continued, therefore, the relevance of research topics aimed at finding ways to improve the accuracy of the classification of objects, has no doubt.In this context, the key is the problem of creating new or modifying existing methods of recognition.
It should be noted that the use of parametric methods based on Bayesian statistics is favored by researchers who solve specific problems in industrial production [10][11][12].As a classifying rule, linear discriminant functions are used, which, however, have the disadvantage that not always a linear hyperplane can qualitatively separate arbitrarily located clusters of classified objects.
Unlike the approach based on the construction of an analytical description of the classifying rule, some alternative variants do not at all suggest the construction of any mathematical description.Examples can be a linear learning machine or the nearest-neighbor classification [13].
Among the most common in recent years, it is necessary to identify classification methods based on the use of neural networks [14][15][16].The key to using these methods is the neural network training quality.Thus, the paper [15] describes the results of applying one of the classical algo- rithms for development of a decision support subsystem in a neural network pattern recognition system.As effectiveness criteria, the authors suggest to use the level of subjectivism of expert decisions and the quality of expert assessments in the construction of training samples on the existing statistical data describing the objects.In this case, the first of the effectiveness criteria should be minimized, and the second one should be maximized.As a result of applying the proposed algorithm, the authors give the following data: the relative share of correct expert assessment increases by an average of 20 %, and the relative proportion of false ones decreases by 50 %.
It should be noted that the limitations imposed on the use of the proposed method are not presented in this article.Nevertheless, it is stated that the existing basic "classical" training models, for example, based on error correction, using memory, competitive training and the Boltzmann method, have a number of shortcomings.In particular, it is noted that when creating universal image recognition systems, it is impossible to manage one of the training models.This is especially evident in specific subject areas, where the construction of a universal qualitative recognition system is confronted with an almost unsolvable task of taking into account all the specific features inherent in these subject areas.
It is possible to use the Levenberg-Marquardt method [16], which, unlike the classical learning algorithms, uses Z-training followed by averaging the network error.The efficiency criterion is time of its operation of the algorithm, which should be minimized.However, the key presentation of the results is limited to the theoretical description of the algorithm.
All the mentioned studies assume the work with random variables with a known law of data distribution.Obvious complications in their use are due to the fact that the condition of knowledge of the laws of distribution is not always satisfied.At the same time, if the components of image vectors can't be measured with sufficient accuracy, the possibility of constructing effective recognition systems becomes questionable.The ways of solving the arising problems are considered, for example, in [17][18][19].
It is shown in [17] that the use of the principal component method (PCA) in combination with hierarchical clustering is an effective practical tool for the automated identification of the studied objects.
The approach described in [18] is based on a point estimate for fuzzy pattern recognition conditions.In this approach, the space is considered to consist of two fuzzy sets ("Truthful" and "Deceiver").At the initial stage of the procedure each separate classified element is modeled as a fuzzy set based on the generation method.The purpose of this procedure is to eliminate the uncertainty in the assessment of the corresponding scores.At a later stage, new fuzzy conformity assessments are integrated with the fuzzy aggregation operator, after which the final decision on recognition is taken.The authors of [17,18] note that the proposed method has high stability, but the limitations imposed on these methods by special conditions of subject areas are not named.
The approach to data processing for classification using the methods of computational intelligence can be found in the study [19], which proposed an adaptive algorithm for fuzzy clustering using an objective function of the form under the following restrictions: ( ) -the level of belonging of the vector x(k) to the j-th cluster, c j -centroid of the j-th cluster, ( ) ( ) d x k ,c -distance between x(k) and c j in the accepted metric, β -a nonnegative parameter called the "fuzzifier" (if used ( ) ( ) d x k ,c as a Euclidean Distance, is assumed to be 2).The result of this algorithm is the formation of a fuzzy partition matrix, in which objects are divided into clusters (diagnoses), while the shape of clusters can vary from a hypersphere to a hyperellipsoid depending on the form of the original data.The deciding factor for deciding whether to assign an object to one of the classes is the choice of the distance between x(k) and c j : where A j -the inverse fuzzy covariance matrix of each cluster.The peculiarity of this approach is insensitivity to the ratio of the number of objects to the number of indicators characterizing these objects, as well as to the law of data distribution.However, nothing is said about the exact requirements that the domain of the initial data should satisfy.
The principles of clustering objects in the conditions of data fuzziness describing them are described in [20].Their essence is as follows.
Let n experiments be done and the result of j-th of them is determined by the set of values of the coordinates jp F , j where jp F -the measured value of the p-th coordinate in the j-th experiment, which is modal for a fuzzy number jp F , j 1,2,...,n, = p 1,2,...,m, = jp , α jp β -left and right fuzzy coefficients in the description (4).
For each point, the fuzzy distance to each of the attraction centers of the clusters and the corresponding membership function are calculated.Then the obtained membership functions are used to find a cluster that has the highest degree of preference relative to the point under consideration.A formal description of this procedure is as follows.
For the pair (the k-th cluster, the j-th point), the fuzzy value of the square of the distance from the attraction center of the cluster to the point is introduced The parameters of the number LR LR LR C A B a, , = ⋅ = α β are calculated by the formulas: In accordance with this, the parameters of fuzzy numbers In this case, the membership function of the fuzzy value of the square of the distance from the k-th center to the j-th point has the form: In this problem, the above general relations are simplified, since the coordinates of the attraction centers of clusters are clear numbers and therefore As a result of the implementation of the described procedure, for each of the points, the fuzzy number membership functions will be obtained, representing the "distances" to the centers of the corresponding clusters.These numbers now need to be compared among themselves, choosing one of them for which the degree of preference for the rest will be the least.This number will determine the cluster, "closest" to the considered point.The estimation of the degree of fuzzy number preference z k before a fuzzy number z l is carried out according to the formula: Using (6), the choice of a fuzzy number with the least degree of preference in relation to other numbers of the set of difficulties does not cause.The cluster number * k to which the next point will be attached is determined by the relation The above relations ( 5)-(7) provide a solution to the problem of fuzzy clustering.Thus, the procedure described in [20] is based on calculating the distance between the classified point and the attraction center of the cluster -the point refers to the cluster whose distance to the COG is smaller.This idea can also be used to improve the classification accuracy if the linear discriminant function constructed using the parametric method of classification does not provide high recognition rates.In this regard, the aim of this article is to develop a classification method for incorrectly classified objects after constructing a linear discriminant function.
To achieve this aim, it is necessary to prove the principles of selecting informative attributes of the space of factors-attributes and an algorithm that allows to form a classifying rule.

Materials and methods of research
As a research material, a test sample of data, including objects of two classes A and B, is selected and presented in [24] (Fig. 1).The following assumptions are accepted in the paper: -let as a result of the normalization of the values of the variables x i characterizing the position of objects in the space of dimensionality characteristics (N×2) are within the range [-1; +1]; -let it be known that N The problem consists in constructing of a straight line dividing in the space of attributes both classes by means of parametric classification methods.

Fig. 1. Space of features for the test problem
The classifying rule using the likelihood ratio l(x j ) is represented in the form [25] ( ) ( ) where P(A), P(B) -the a priori class probabilities, and the probability density р А (Х) and р В (Х) have the form: d -constant factor, m A , m B -the mathematical expectations of classes A and B respectively, -inverse of the covariance matrix.In this case the discriminant function has the form ( ) ( )( ) The threshold value of the discriminant function, which makes it possible to make a comparison for deciding whether an object belongs to a specific class, is calculated on the basis of equation Let's suppose that the a priori probabilities of the classes are the same: P(A)=P(B)=0,5.The parameters of the distributions p A (x) and p B (x), calculated on the basis of the initial data of the test problem (Fig. 1), are: A 0,16143 m , 0,135714 The classifying rule is constructed in the form of a linear discriminant function described by equation (10), in which the threshold value is determined analytically on the basis of (11).This discriminant function specifies the position of the dividing line in space (N×2) and the associated classification accuracy.
After implementation of the described procedures, a classification rule is constructed, the coefficients in the analytic description of which for the class A have the form: x A if 5,8225x 1,1292x 0,5547, x B if 5,8225x 1,1292x 0,5547.
The classifying rule, the coefficients in the analytical description of which are calculated for class B, has the form: x A if 3,6993x 0,2082x 0,2786, x B if 3,6993x 0,2082x 0,2786.
Fig. 2 shows the results of constructing the dividing lines described by the functions ( 12) and (13).The result, shown in Fig. 2, is obtained after eliminating the inaccuracy in calculating the coefficients in the description of the linear discriminant function, which presence can be defined as the error of the dividing line at the position y=y 0 , which is proportional to the area of the shaded figure ABCD (Fig. 3) [24].
Fig. 2 shows that both dividing lines are close to each other, overlapping of classes A and B is observed, in which there is a fairly large number of objects from both classes.The classification accuracy, estimated by the probability of objects falling into the appropriate class using expressions ( 14) and (15), is 82.9 % for class A and 71.4 % for class B (using the classifying rule in the form (when using the classification rule in the form (12).
where n A , n B -the number of correctly classified elements for classes A and B respectively, N A , N B -the total number of elements of classes A and B, respectively.The results obtained using the classifying rule in the form (13) are approximately the same, due to the considerable proximity of the dividing lines.
The obtained result means that 6 objects from 35 are incorrectly classified for Class A, and 10 objects out of 35 are incorrectly classified for class B. It is necessary to prove the principles of selecting informative attributes of the space of characteristic factors in such way as to maximize the proportion of correctly classified objects.

Research results
To select the informative attributes of the space of characteristic factors that allow to maximize the total fraction of properly classified objects by assigning incorrectly classified objects to the corresponding classes at the first stage -constructing a linear discriminant function -the following assumption can be used.An incorrectly classified object has a connection with its own cluster, but due to a number of reasons, a random character found itself in another cluster.If such connection is established, then it becomes possible to formalize the assignment of this object to its cluster.As such connection is determined, one can use the idea of "gravitation" of the object to its COG [20].In this case, all the original object space must be "freed" from correctly classified objects and only incorrectly classified objects should be left in the test sample.The variant of data distribution after such transformation of the initial sample has the form shown in Fig. 4.

Fig. 4. Incorrectly classified objects of classes A (n A =6) and B (n B =10) after constructing a linear
discriminant function in the form (12) From this it follows that the position of the incorrectly classified object must be associated with two elements -the mathematical expectation of the corresponding class, which is its COG, and position of the dividing line.Such connection is represented in the form of the corresponding segments of straight lines representing the sides of a triangle with vertices: classifying point -COG -the intersection point of the normal that is drawn from the classified point to the dividing line.The COG of such "local" cluster made up of the vertices of this triangle is chosen as a new variable of the feature space.Consequently, it is from such COGs of local clusters that a new sample is formed to construct the classification rule.A demonstration of this idea is shown in Fig. 5.
From Fig. 5 it can be seen that the implementation of the proposed idea provides the possibility of correct classification of previously incorrectly classified objects.Indeed, the COG of the Computer Sciences and Mathematics local cluster formed for an object with coordinates (0.3-0.25) belongs to class A, but incorrectly assigned to class B (blue markers), is to the left of the dividing line.This means that an object with coordinates (0.3; -0.25) belongs to class A. An analogous conclusion can be made about an object with coordinates (-0.05, 0.3) belonging to class B, but incorrectly assigned to the class A (red markers) is to the right of the dividing line.This means that an object with coordinates (-0.05, 0.3) belongs to class B. Methods of analytical geometry are used to calculate the vertices of local clusters.Consider the i-th local cluster DEF, whose vertices have the coordinates D(x 1D ; x 2D ), E(x 1E ; x 2E ), F(x 1F ; x 2F ).
The coordinates of the vertex D(x 1D ; x 2D ) are known -they are the coordinates of the incorrectly classified point B in the example (x 1D ; x 2D )=(0,3; -0,25).
The coordinates of the vertex E(x 1E ; x 2E ) are unknown, but they can be found as the coordinates of the intersection point of the dividing line and the normal drawn to the given line from D. For this it is necessary to obtain the equation of the normal DE, knowing the equation of the dividing line and presenting it as 5,8225x 1,1292x 0,5547 0.
The transformation of the general form of the line а 1 х 1 +а 2 х 2 +а 0 =0 to the form х 2 =kx 1 +allows to determine the coefficients k and b: The equation of the normal passing through the point D(x 1D ; x 2D ) has the form ( ) where The coordinate of the point E(x 1E ; x 2E ) is determined by solving a system of linear equations of the form: 5,8225x 1,1292x 0,5547 0, − − + = ( ) It is easy to verify this by carrying out simple transformations of equations ( 16) and (17).
Similarly, using Microsoft Excel tools, the vertices of local clusters for all incorrectly classified points of class A are calculated.Fig. 6 shows incorrectly classified objects of class A and the calculated coordinates of the COGs of the corresponding local clusters.As it follows from Fig. 7, five of the six objects can be assigned to their class A. One object with coordinates (0.3, 0.2) is in an undefined state -the COG of the corresponding local cluster has coordinates (0.067958, 0.163399) and practically falls on the dividing line.Taking into account the non-strict inequality of the classifying rule (12), it can be assigned to class A. So it is correctly classified.However, for a more accurate decision making, an accurate calculation of the distance from the COG of the corresponding local cluster to the dividing line is necessary.It is also necessary to determine the sign of the corresponding segment, located along the normal to the dividing surface.
The results of similar calculations of the COGs of local clusters constructed for incorrectly classified objects of class B are shown in Fig. 8, 9.
To determine the distance from the i-th COG to the dividing line, it is necessary to reduce its equation to the normal form by introducing a normalizing factor Taking into account that the free term of equation ( 16) has a positive sign, the "-" sign should be taken before the right-hand side of equation (19).Multiplying (19) and ( 16) and substituting the coordinates of i-th COG, the distances d i from the i-th COG to the dividing line are calculated.In this case, the sign in front of d i makes it possible to determine the location of i-th COG relative to the dividing line, thus forming a classification rule:   The results of calculations of distances d i are given in Table 1.

Discussion of research results
As can be seen from the results obtained for incorrectly classified objects of class B (Fig. 9), the proposed procedure allows to solve the problem of correct classification of incorrectly classified after constructing the linear discriminant function of objects by only 50 %.The same is confirmed by the results of applying the proposed classification rule (20) -from Table 1 it can be seen that 5 of 10 previously incorrectly classified objects are again classified incorrectly.Much better is the result for Class A -only 1 of the 6 objects is again classified incorrectly.
In terms of the effectiveness of the proposed classification procedure as a two-stage process in which a linear discriminant function is constructed at the first stage, and a classifying rule of the form ( 20) is applied on the second stage, then it is obvious.Indeed, for class A, the classification accuracy calculated using formula ( 14) increases from 82.9 % to 97.1 %, and for class B, the classification accuracy calculated using formula (15) increases from 71.4 % up to 85.7 %.Such result can be considered good, thus confirming the effectiveness of the proposed procedure.
Despite the obvious disadvantages associated with a number of assumptions inherent in all parametric classification methods [24], the proposed procedure can be used in practical recognition problems for feature factor spaces of N×2 dimension.Moreover, it has the potential for increasing accuracy, due to an increase in the accuracy of the location of the dividing line.The latter can be done by using the methods of experiment planning to construct a linear discriminant function.In particular, it is about the selection of objects of both classes having coordinates corresponding to the vertices of the plan of the full factorial design inscribed in the actual area of the spaces of factors-attributes of dimension N×2.

Conclusions
1.A procedure is proposed for selecting informative attributes of the space of characteristic factors that allow to maximize the total fraction of correctly classified objects by assigning incorrectly classified objects to the corresponding classes after the first stage of parametric classification -construction of a linear discriminant function.Its essence consists in establishing the connection of an incorrectly classified object precisely with its class.It is suggested, as such connection, to consider the position of an incorrectly classified object with respect to the mathematical expectation of the corresponding class, which is its COG, and the position of the dividing line, which has the form of a linear discriminant function.It is shown that in order to realize this idea it is necessary to develop local clusters that are a triangle with vertices: the classified point is the COG is the point of intersection of the normal drawn from the classified point to the dividing line.The COGs of such local clusters are chosen as new variables of feature space.Consequently, it is from such COGs of local clusters that a new sample is formed to construct the classification rule.
2. A classifying rule is proposed that allows one to classify an object incorrectly classified after constructing a linear discriminant function to its class.It is based on the determination of the position of the COGs of local clusters relative to the dividing line (distance d i ).This rule is as follows: j It is necessary to obtain the fuzzy number 2 kj ρ membership function and for this purpose let's use known relations for the results of operations on fuzzy numbers of (L R) − type[22,23].Let, are calculated by the formulas: vector Х of the area of variables for class А Coordinates of the vector Х of the area of variables for class В

Fig. 2 .Fig. 3 .
Fig. 2. Results of constructing the dividing straight lines in the space of dimensionality (N×2) for the test problem

Fig. 5 .
Fig. 5. Demonstration of the idea of constructing local clusters for formation of an informative data sample: Δ -COG of a local cluster DEF, Δ -COG of a local cluster constructed for an incorrectly classified point of class B with coordinates (-0.05; 0.3)

Fig. 6 .
Fig. 6.Incorrectly classified objects of class A and calculated coordinates of the COGs of the corresponding local clusters

Fig. 7 .
Fig. 7. Location of the COGs of local clusters for all six incorrectly classified objects of class A relative to the dividing line

Fig. 8 .Fig. 9 .
Fig. 8. Incorrectly classified objects of class B and calculated coordinates of the COGs of the corresponding local clusters

3 .
It is shown that for incorrectly classified objects the transition from use as the factors-attributes of their initial values to the use of the COGs of local clusters provides the possibility of increasing the classification accuracy by 14 %.
by the relations: