### Article Directory

I. Introduction to K-NN II. K-NN Classification III. K-NN Classification Examples IV. K-NN Classification Accuracy Evaluation Method V. Keeping Method VI. k k k-Cross Confirmation Method VII. K-NN Classification result evaluation index VIII. Classification judgment two-dimensional table IX. Accuracy rate X. Recall rate XI. Correlation between accuracy rate and recall rate XII. Comprehensive consideration of accuracy rate and recall rate

#### I. Introduction to K-NN

Introduction to K-NN:

① Full name: The full name of K-NN is K-Nearest Neighbors, which is the algorithm of K-Nearest Neighbors;

② Definition: Given query point p p p, find the distance from p p p the most recent K K K points, find all q kq _ k q k span> point, q k q _ k q k span> The point requirement is to point to p p p the distance is less than its k k k the distance between neighbors;

③ Way of understanding: By p p p Point to the center of the circle and draw a circle, count it Inside the circle, and the points on the sides of the circle are made of K K K, if the number is not enough K K K, expand the radius until the edge is round The number of points in the sum garden is greater than or equal to K K K until ;

④ Icon: The red dot is p p p point, the green point is p p p Click 9 9 9 nearest neighbors, green on the circle Point is the first 9 9 9 nearest neighbors;

#### II. K-NN classification

K-NN classification:

① Known conditions: Assuming a given query point p p p, until its K K K nearest neighbors;

② Classification content: The purpose of K-NN is to provide query points p p p to classify ;

③ The data set samples are abstracted into points: The data samples of the training set are regarded as n n n The point in the dimensional space; p>

④ Predictive classification: Given an unknown sample p p p, to classify the location sample, first Take this unknown sample as the query point, and use p p p point as the center, find the point of the sample In n n n in the dimensional space K K K neighbors, this K K K neighbors according to the value of a certain attribute Group, the unknown sample p p p is assigned to the group with the most samples;

#### III. Examples of K-NN classification

Classify the red dots below: There are two classifications, the classification of the green dots is A A A, and the classification of the purple dot is B B B, to classify the red dots; p>

1-NN classification: At this time A A A There are 1 1 1, B B B There are 0 0 0, the red dots are divided into A A A category ;

3-NN classification: At this time A A A There are 1 1 1 one, B B B There are 2 2 2, the red dots are divided into BB B category ;

9-NN classification: At this time A A A There are 5 5 5 one, B B B There are 2 2 2, the red dots are divided into A A A category ;

15-NN classification: At this time A A A There are 5 5 5, B B B There are 9 9 9, the red dots are divided into B B B category ;

K-NN classification accuracy: The larger the amount of data, the higher the accuracy; the idea of K-NN is to be consistent with most of the surrounding samples;

#### IV. K-NN classification accuracy assessment method

K-NN classification accuracy evaluation method: Keeping method, k k k-cross confirmation method, These two methods are commonly used K-NN methods to evaluate classification accuracy;

#### V. Keeping method

1. Holding method:

① Division of training set and test set: Randomly divide the data set samples into two independent data sets, which are the training set for training and learning, and the test for verification testing Set;

② Training set test set sample ratio: Data set division ratio, usually, training set 2 3 \dfrac{2}{3} 3 2 , test set 1 3 \dfrac{1}{3} 3 1 ;

③ Random division: The division must ensure randomness, and there should be no preference when dividing;

2. Random sample selection method: execute K K K secondary retention method, get K K K accuracy rate, the overall accuracy rate is taken This KK K The average of the accuracy rate; p>

3. The nature of the random selection method: Another form of the retention method, which is equivalent to using multiple retention methods;

#### VI. k k k-cross confirmation method

1. k k k-cross confirmation method: First divide the data set, and then proceed to k k k training tests, and finally calculate the accuracy ;

2. Divide the data set: Divide the data set sample into k k k independent subsets, which are { S 1 , S 2 , ... , S k} \{ S _ 1, S _ 2, \cdots, S _ k \} { S 1 , S span> 2 , … , S k }, the number of samples in each subset should be the same as possible;

3. Training test:

① Number of training tests: Training k k k times, test k k k times, each training must correspond once Test;

② Training and testing process: No. i i i training times, use S i S _ i S i span> as the test set, the rest ( k − 1 ) (k-1) ( k − 1 ) A subset as the training set;

4. Training test example: Training k k k times ;

No. 1 1 1 training, use S 1S _ 1 S 1 span> as the test set, the rest ( k − 1 ) (k-1) ( k − 1 ) Subsets as the training set; 2 2 2 training, using S 2 S _ 2 S 2 span> as the test set, the rest ( k −1 ) (k-1) ( k − 1 ) Subsets as the training set; ⋮ \vdots ⋮ No. k k k training times, use S k S _ k S k span> as the test set, the rest ( k − 1 ) (k-1) ( k − 1 ) A subset as the training set;

5. Accuracy results:

① Single training test results: k k k times test training, each time you use S i S _ i S i span> As the test set, the tested subset has the correct classification and the classification error Of;

② Overall accuracy rate: k k k After testing, it is equivalent to the entire data A subset of the set { S 1 , S 2 , ... ,S k } \{ S _ 1, S _ 2, \cdots, S _ k \} { S 1 , S span> 2 , … , S k } are tested as a test set, and the number of samples of the overall data set that are correctly classified Y Y Y, divided by the overall number of samples T T T, you can get k k k-cross-confirmation accuracy result Y T \dfrac{Y}{T} T Y ;

#### VII. K-NN Classification Results Evaluation Index

K-NN classification results evaluation indicators: ① accuracy rate ② recall rate;

#### VIII. Classification and Judgment Two-dimensional Table

1. Classification and judgment two-dimensional table: Here, a two-dimensional table is introduced. This two-dimensional table represents the judgment of people and machines on the sample;

The human judgment is correct and the human judgment is wrong The machine judgment is correctab The machine judgment is wrongcd2. Analysis of the correctness of sample classification: strong>

① Three kinds of cognition of sample classification: The actual classification of the sample, the classification thought by people, the classification thought by the machine;

② The actual classification of the sample: The actual classification of the sample is A A A ;

③ Human judgment: People think that the sample classification is A A A, it means that the person judged correctly, if the person thinks the sample is classified as B B B, it means that the person's judgment is wrong;

④ The machine's judgment: The machine thinks that the sample classification is A A A, it means that the machine judged correctly; if the machine thinks The sample is classified as B B B, indicating that the machine is wrong;

3. The meaning of the data in the table: in the table a , b , c , d a, b, c, d a , b , c , d The value indicates the number of samples; :

① a a a Meaning: means human judgment Correct, the machine judges the correct number of samples; the data sets the number of people andThe number of samples correctly classified by the machine at the same time;

② b b b Meaning: means human judgment Error, the number of samples correctly judged by the machine; The number of samples correctly classified by the human in the data set;

③ c c c Meaning: means human judgment Correct, the number of samples that were judged incorrectly by the machine; The number of samples that were classified correctly by the machine in the data set;

④ d d d Meaning: means human judgment Error, the number of samples that the machine judged incorrectly; the number of samples that were classified incorrectly by both the human and the machine in the data set;

#### IX. Accuracy rate

1. Accuracy calculation formula:

P = a a + b P = \frac{a}{a + b} P = a + b a span>

( a + b ) (a + b) ( a + b ) Is the total number of samples correctly classified by the machine;

a a a is a sample that both humans and machines think are correct Number;

2. Understanding the accuracy rate: Which of the samples that are correctly classified by the machine is the correct sample; ( a + b ) (a + b) ( a + b ) Is the sample that the machine thinks is correct, of which only a a a samples are really correct ;

#### X. Recall rate

1. Recall rate calculation formula:

R = a a + c R = \frac{a}{a + c} R = a + c a span>

( a + c ) (a + c) ( a + c ) Is the total number of samples that people think are classified correctly;

a a a is a sample that both humans and machines think are correct Number;

2. Recall rate understanding: Among the samples that people think are classified correctly, which ones are judged correctly by the machine; ( a + c ) (a + c) ( a + c ) Is the number of samples that people think is correct, and what the machine thinks is correct is a a a samples ;

#### XI. Accuracy rate and recall rate are related

The relationship between accuracy and recall: These two indicators contradict each other;

The accuracy rate and the recall rate influence each other. When the accuracy rate is high, the recall rate is very low;

When the accuracy rate is 100%, the recall rate is very low; when the recall rate is 100%, the accuracy rate is very low;

#### XII. Comprehensive consideration of accuracy rate and recall rate

1. Comprehensive consideration of accuracy rate and recall rate:

F = 1 α 1 P + (1 + α ) 1 R F = \frac {1} {\alpha \dfrac{1}{P} + (1 + \alpha) \dfrac{1}{R}} F = α P 1 + ( 1 + α ) R 1 span> 1 span>

Put the accuracy rate and recall rate in the above formula to calculate, P P P is the accuracy rate, R R R is the recall rate;

α \alpha α is a coefficient, usually α \alpha α Value 0.5 0.5 0 . 5 ;

2. α \alpha α Value 0.5 0.5 0 . 5 when the formula is: The metric is called F 1 F _ 1 F 1 span> value, this value is often used as a metric for K-NN classification results, that is The accuracy rate is considered, and the recall rate is also considered;

F 1 = 2 P R P + R F _ 1 = \frac{2PR}{P + R} F 1 span> = P + R 2 P R p>

Article Url:https://www.liaochihuo.com/info/535339.html

Label group:[Recall rate]