# Stock Liao information

— Basic knowledge of stocks|Introduction to basics of stocks|Stock learning|Basic knowledge of stocks
Mobile access：m.liaochihuo.com

# 使用sklearn对iris数据集进行聚类分析

Release Time:2021-03-12 Topic:python股票预测准确率 Reading：111 Navigation：Stock Liao information > 科技 > 大数据 > 使用sklearn对iris数据集进行聚类分析

# 导入库

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import load_iris from sklearn.cluster import KMeans from sklearn.preprocessing import MinMaxScaler %matplotlib inline sns.set(style="white") pd.set_option("display.max_rows", 1000)

# sklearn自带iris数据集（nrow=150）

4个预测变量 3分类结局 iris = load_iris() X = iris["data"] Y = iris["target"] display(X[:5]) display(pd.Series(Y).value_counts()) Y = Y.reshape(-1, 1) # Y的形状转换为[150, 1] array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5. , 3.6, 1.4, 0.2]]) 2 50 1 50 0 50 dtype: int64 data = pd.DataFrame(np.concatenate((X, Y), axis=1), columns=["x1", "x2", "x3", "x4", "y"]) data["y"] = data["y"].astype("int64") data.head() x1 x2 x3 x4 y 0 5.1 3.5 1.4 0.2 0 1 4.9 3.0 1.4 0.2 0 2 4.7 3.2 1.3 0.2 0 3 4.6 3.1 1.5 0.2 0 4 5.0 3.6 1.4 0.2 0

# 观察数据分布

4个预测变量两两散点图 sns.pairplot(data, hue="y")

# 数据标准化

Kmeans聚类前应对数据进行标准化 scaler = MinMaxScaler() data.iloc[:, :4] = scaler.fit_transform(data.iloc[:, :4]) data.head() x1 x2 x3 x4 y 0 0.222222 0.625000 0.067797 0.041667 0 1 0.166667 0.416667 0.067797 0.041667 0 2 0.111111 0.500000 0.050847 0.041667 0 3 0.083333 0.458333 0.084746 0.041667 0 4 0.194444 0.666667 0.067797 0.041667 0

# 设置类别数为3，进行Kmeans聚类

clus = KMeans(n_clusters=3) clus = clus.fit(data.iloc[:, 1:4]) Clustering labels of 150 samples after clustering clus.labels_ array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32) Three cluster centers after clustering clus.cluster_centers_ array([[0.595, 0.07830508, 0.06083333], [0.2975, 0.55661017, 0.50583333], [0.42916667, 0.76745763, 0.8075 ]]) Assessment index of clustering clus.inertia_ 4.481991774793322

# Determine the optimal number of clusters

Try different numbers of categories, check the criterion value (the smaller the better), and draw the "elbow diagram" L = [] for i in range(1, 9): clus = KMeans(n_clusters=i) clus.fit(data.iloc[:, 1:3]) L.append([i, clus.inertia_]) L = pd.DataFrame(L, columns=["k", "criterion"]) L k criterion 0 1 18.253249 1 2 5.106290 23 3.312646 3 4 2.585065 4 5 1.946648 5 6 1.637264 6 7 1.387541 7 8 1.175937 sns.pointplot(x="k", y= "criterion", data=L) sns.despine()

# According to the selected clustering model, predict the sample

From the "elbow diagram", it can be seen that the best category number is 3 or 4, and it is better to use 3 here. clus = KMeans(n_clusters=3) clus = clus.fit(data.iloc[:, 1:4]) data["pred"] = clus.predict(data.iloc[:, 1:4]) data.loc[data["pred"] == 0, "Pred"] = 11 data.loc[data["pred"] == 1, "Pred"] = 0 data.loc[data["pred"] == 2, "Pred"] = 2 data.loc[data["Pred"] == 11, "Pred"] = 1 data["Pred"] = data["Pred" ].astype("int64") data.head() x1 x2 x3 x4 y pred Pred 0 0.222222 0.625000 0.067797 0.041667 0 1 0 1 0.166667 0.416667 0.067797 0.041667 0 1 0 2 0.111111 0.500000 0.050847 0.041667 0 1 0 3 0.083333 0.458333 0.084746 0.041667 0 1 0 4 0.194444 0.666667 0.067797 0.041667 0 1 0

# 画出预测混淆矩阵，计算准确率

df = pd.crosstab(data["y"], data["Pred"]) df Pred 0 1 2 y 0 50 0 0 1 0 46 4 2 0 4 46 L = [] for i in range(df.shape[0]): for j in range(df.shape[1]): if i != j: L.append(df.iloc[i, j]) print("预测准确率为：", round((150 - sum(L)) / 150 * 100, 1), "%") 预测准确率为： 94.7 % 本文地址：https://blog.csdn.net/weixin_40575651/article/details/107334269

#### Articles you may be interested in:

"Python high-performance programming"-list, tuple, set, dictionary features and creation process Python installation and basic data types Anaconda failed to create the environment, prompting that the program input point could not be located Python3 actual Spark big data analysis and scheduling learning resources Tensorflow view variable value method AppRegistryNotReady: Apps aren't loaded yet when pycharm runs Django. If you have any questions about this article, Click to leave a message reply! !

Article Url:https://www.liaochihuo.com/info/281672.html

Label group:[data] [iris] [聚类

• ### How can long-term profit earners avoid being investigated by the Securities Regulatory Commission?

The stock market is risky, and investment needs to be cautious. 7 losses, 2 draws and 1 profit have already written the stock market risks vividly. H...

2021-07-23 Read full >>
• ### Seeking U.S. stock brokers? Which one has the lowest commission?

Is there any commission lower than 0? Directly list more than a dozen Hong Kong and US stock brokers, and the lowest one can achieve zero commissi...

2021-07-23 Read full >>
• ### At least one commission is charged

❶ Stock trading commissions, why do all securities companies have this rule starting at a minimum of RMB 5? Basically all companies have such a rule....

2021-07-23 Read full >>
• ### Foreign investment officially enters the market, and the brokerage structure is about to be reshuffled! Sheep enter the tiger’s mouth, or pretend to be a pig and eat a tiger? _ Financial ope

In June 2018, A shares were officially included in MSCI. In June 2019, the ShanghaiLondon Stock Connect was officially launched. At the same time, ...

2021-07-23 Read full >>
• ### Brokerage Transformation: GF Surgery Salesman, Galaxy 100 C-type Chargers

Sina Finance Client China Securities Regulatory Commission’s New Deal activates eight key words of Ashare IPO reform The Wealth Management Departmen...

2021-07-23 Read full >>
• ### Brokerage business has declined so badly, but do not trust the sky to eat? Take a look at the four new ways of listed brokers

In the first half of this year, the significant yearonyear decline in the performance of brokerage firms has been within market expectations, and t...

2021-07-23 Read full >>
• ### Securities industry: public funds, the pattern and outlook of the blue ocean market

Highlights Promote the transition to buyer investment advisory, which is expected to optimize the profitability and structure of asset management in...

2021-07-23 Read full >>
• ### GF Securities-Dual Drives of Asset Management and Investment Banking

Investment banking business IPO directly drives the expected income increase of the corresponding underwriters and boosts the performance of securit...

2021-07-23 Read full >>
• ### 97 brokerages increased their commission income from warehousing by nearly 80% last year, with a total of more than 13.5 billion yuan

Investment research report Institutional research One brother shot Research on Wang Yaweis company has soared more than 7 times CRO big bull stocks L...

2021-07-23 Read full >>
• ### 2020 Brokers' Job-hopping View: This is the one with the largest number of leavers in the institute

Original title 2020 Brokers Jobhopping View This is the one with the largest number of resignations from the Institute p>In the last month of 2020, ...

2021-07-23 Read full >>