Article Directory
I. Introduction to K-NN II. K-NN Classification III. K-NN Classification Examples IV. K-NN Classification Accuracy Evaluation Method V. Keeping Method VI. k k k-Cross Confirmation Method VII. K-NN Classification result evaluation index VIII. Classification judgment two-dimensional table IX. Accuracy rate X. Recall rate XI. Correlation between accuracy rate and recall rate XII. Comprehensive consideration of accuracy rate and recall rate
I. Introduction to K-NN
Introduction to K-NN:
① Full name: The full name of K-NN is K-Nearest Neighbors, which is the algorithm of K-Nearest Neighbors;
② Definition: Given query point p p p, find the distance from p p p the most recent K K K points, find all q kq _ k q k span> point, q k q _ k q k span> The point requirement is to point to p p p the distance is less than its k k k the distance between neighbors;
③ Way of understanding: By p p p Point to the center of the circle and draw a circle, count it Inside the circle, and the points on the sides of the circle are made of K K K, if the number is not enough K K K, expand the radius until the edge is round The number of points in the sum garden is greater than or equal to K K K until ;
④ Icon: The red dot is p p p point, the green point is p p p Click 9 9 9 nearest neighbors, green on the circle Point is the first 9 9 9 nearest neighbors;
II. K-NN classification
K-NN classification:
① Known conditions: Assuming a given query point p p p, until its K K K nearest neighbors;
② Classification content: The purpose of K-NN is to provide query points p p p to classify ;
③ The data set samples are abstracted into points: The data samples of the training set are regarded as n n n The point in the dimensional space; p>
④ Predictive classification: Given an unknown sample p p p, to classify the location sample, first Take this unknown sample as the query point, and use p p p point as the center, find the point of the sample In n n n in the dimensional space K K K neighbors, this K K K neighbors according to the value of a certain attribute Group, the unknown sample p p p is assigned to the group with the most samples;
III. Examples of K-NN classification
Classify the red dots below: There are two classifications, the classification of the green dots is A A A, and the classification of the purple dot is B B B, to classify the red dots; p>
1-NN classification: At this time A A A There are 1 1 1, B B B There are 0 0 0, the red dots are divided into A A A category ;
3-NN classification: At this time A A A There are 1 1 1 one, B B B There are 2 2 2, the red dots are divided into BB B category ;
9-NN classification: At this time A A A There are 5 5 5 one, B B B There are 2 2 2, the red dots are divided into A A A category ;
15-NN classification: At this time A A A There are 5 5 5, B B B There are 9 9 9, the red dots are divided into B B B category ;
K-NN classification accuracy: The larger the amount of data, the higher the accuracy; the idea of K-NN is to be consistent with most of the surrounding samples;
IV. K-NN classification accuracy assessment method
K-NN classification accuracy evaluation method: Keeping method, k k k-cross confirmation method, These two methods are commonly used K-NN methods to evaluate classification accuracy;
V. Keeping method
1. Holding method:
① Division of training set and test set: Randomly divide the data set samples into two independent data sets, which are the training set for training and learning, and the test for verification testing Set;
② Training set test set sample ratio: Data set division ratio, usually, training set 2 3 \dfrac{2}{3} 3 2 , test set 1 3 \dfrac{1}{3} 3 1 ;
③ Random division: The division must ensure randomness, and there should be no preference when dividing;
2. Random sample selection method: execute K K K secondary retention method, get K K K accuracy rate, the overall accuracy rate is taken This KK K The average of the accuracy rate; p> 3. The nature of the random selection method: Another form of the retention method, which is equivalent to using multiple retention methods; 1. k k k-cross confirmation method: First divide the data set, and then proceed to k k k training tests, and finally calculate the accuracy ; 2. Divide the data set: Divide the data set sample into k k k independent subsets, which are { S 1 , S 2 , ... , S k} \{ S _ 1, S _ 2, \cdots, S _ k \} { S 1 , S span> 2 , … , S k }, the number of samples in each subset should be the same as possible; 3. Training test: ① Number of training tests: Training k k k times, test k k k times, each training must correspond once Test; ② Training and testing process: No. i i i training times, use S i S _ i S i span> as the test set, the rest ( k − 1 ) (k-1) ( k − 1 ) A subset as the training set; 4. Training test example: Training k k k times ; No. 1 1 1 training, use S 1S _ 1 S 1 span> as the test set, the rest ( k − 1 ) (k-1) ( k − 1 ) Subsets as the training set; 2 2 2 training, using S 2 S _ 2 S 2 span> as the test set, the rest ( k −1 ) (k-1) ( k − 1 ) Subsets as the training set; ⋮ \vdots ⋮ No. k k k training times, use S k S _ k S k span> as the test set, the rest ( k − 1 ) (k-1) ( k − 1 ) A subset as the training set; 5. Accuracy results: ① Single training test results: k k k times test training, each time you use S i S _ i S i span> As the test set, the tested subset has the correct classification and the classification error Of; ② Overall accuracy rate: k k k After testing, it is equivalent to the entire data A subset of the set { S 1 , S 2 , ... ,S k } \{ S _ 1, S _ 2, \cdots, S _ k \} { S 1 , S span> 2 , … , S k } are tested as a test set, and the number of samples of the overall data set that are correctly classified Y Y Y, divided by the overall number of samples T T T, you can get k k k-cross-confirmation accuracy result Y T \dfrac{Y}{T} T Y ; K-NN classification results evaluation indicators: ① accuracy rate ② recall rate; 1. Classification and judgment two-dimensional table: Here, a two-dimensional table is introduced. This two-dimensional table represents the judgment of people and machines on the sample; 2. Analysis of the correctness of sample classification: strong> ① Three kinds of cognition of sample classification: The actual classification of the sample, the classification thought by people, the classification thought by the machine; ② The actual classification of the sample: The actual classification of the sample is A A A ; ③ Human judgment: People think that the sample classification is A A A, it means that the person judged correctly, if the person thinks the sample is classified as B B B, it means that the person's judgment is wrong; ④ The machine's judgment: The machine thinks that the sample classification is A A A, it means that the machine judged correctly; if the machine thinks The sample is classified as B B B, indicating that the machine is wrong; 3. The meaning of the data in the table: in the table a , b , c , d a, b, c, d a , b , c , d The value indicates the number of samples; : ① a a a Meaning: means human judgment Correct, the machine judges the correct number of samples; the data sets the number of people andThe number of samples correctly classified by the machine at the same time; ② b b b Meaning: means human judgment Error, the number of samples correctly judged by the machine; The number of samples correctly classified by the human in the data set; ③ c c c Meaning: means human judgment Correct, the number of samples that were judged incorrectly by the machine; The number of samples that were classified correctly by the machine in the data set; ④ d d d Meaning: means human judgment Error, the number of samples that the machine judged incorrectly; the number of samples that were classified incorrectly by both the human and the machine in the data set; 1. Accuracy calculation formula: P = a a + b P = \frac{a}{a + b} P = a + b a span> VI. k k k-cross confirmation method
VII. K-NN Classification Results Evaluation Index
VIII. Classification and Judgment Two-dimensional Table
IX. Accuracy rate
( a + b ) (a + b) ( a + b ) Is the total number of samples correctly classified by the machine;
a a a is a sample that both humans and machines think are correct Number;
2. Understanding the accuracy rate: Which of the samples that are correctly classified by the machine is the correct sample; ( a + b ) (a + b) ( a + b ) Is the sample that the machine thinks is correct, of which only a a a samples are really correct ;
X. Recall rate
1. Recall rate calculation formula:
R = a a + c R = \frac{a}{a + c} R = a + c a span>
( a + c ) (a + c) ( a + c ) Is the total number of samples that people think are classified correctly;
a a a is a sample that both humans and machines think are correct Number;
2. Recall rate understanding: Among the samples that people think are classified correctly, which ones are judged correctly by the machine; ( a + c ) (a + c) ( a + c ) Is the number of samples that people think is correct, and what the machine thinks is correct is a a a samples ;
XI. Accuracy rate and recall rate are related
The relationship between accuracy and recall: These two indicators contradict each other;
The accuracy rate and the recall rate influence each other. When the accuracy rate is high, the recall rate is very low;
When the accuracy rate is 100%, the recall rate is very low; when the recall rate is 100%, the accuracy rate is very low;
XII. Comprehensive consideration of accuracy rate and recall rate
1. Comprehensive consideration of accuracy rate and recall rate:
F = 1 α 1 P + (1 + α ) 1 R F = \frac {1} {\alpha \dfrac{1}{P} + (1 + \alpha) \dfrac{1}{R}} F = α P 1 + ( 1 + α ) R 1 span> 1 span>
Put the accuracy rate and recall rate in the above formula to calculate, P P P is the accuracy rate, R R R is the recall rate;
α \alpha α is a coefficient, usually α \alpha α Value 0.5 0.5 0 . 5 ;
2. α \alpha α Value 0.5 0.5 0 . 5 when the formula is: The metric is called F 1 F _ 1 F 1 span> value, this value is often used as a metric for K-NN classification results, that is The accuracy rate is considered, and the recall rate is also considered;
F 1 = 2 P R P + R F _ 1 = \frac{2PR}{P + R} F 1 span> = P + R 2 P R
p>Article Url:https://www.liaochihuo.com/info/535339.html
Label group:[Recall rate]