码迷,mamicode.com
首页 > 编程语言 > 详细

Cluster Analysis in Python

时间:2020-03-03 12:39:48      阅读:91      评论:0      收藏:0      [点我收藏+]

标签:tps   imp   criterion   column   处理   rar   sci   聚合   ring   

聚类

数据是么有标签的,属于无监督学习

hierarchical clustering

层次聚类法

  • linkage:聚合距离函数
  • fcluster:层次聚类函数
  • 使用scipy包中的函数
# Import linkage and fcluster functions
from scipy.cluster.hierarchy import linkage, fcluster

# Use the linkage() function to compute distances
Z = linkage(df, 'ward')

# Generate cluster labels
df['cluster_labels'] = fcluster(Z, 2, criterion='maxclust')

# Plot the points with seaborn
sns.scatterplot(x='x', y='y', hue='cluster_labels', data=df)
plt.show()

技术图片

kmeans

均值聚类

  • 使用vq函数将样本数据中的每个样本点分配给一个中心点,形成n个聚类vq
  • whiten:白化预处理是一种常见的数据预处理方法,作用是去除样本数据的冗余信息

    Normalize a group of observations on a per feature basis.

# Import kmeans and vq functions
from scipy.cluster.vq import kmeans, vq

# Compute cluster centers
centroids,_ = kmeans(df, 2)

# Assign cluster labels
df['cluster_labels'], _ = vq(df, centroids)

# Plot the points with seaborn
sns.scatterplot(x='x', y='y', hue='cluster_labels', data=df)
plt.show()
# Import the whiten function
from scipy.cluster.vq import whiten

goals_for = [4,3,2,3,1,1,2,0,1,4]

# Use the whiten() function to standardize the data
scaled_data =whiten(goals_for)
print(scaled_data)

<script.py> output:
    [3.07692308 2.30769231 1.53846154 2.30769231 0.76923077 0.76923077
     1.53846154 0.         0.76923077 3.07692308]

fifa数据集的一个小demo

# Scale wage and value
fifa['scaled_wage'] = whiten(fifa['eur_wage'])
fifa['scaled_value'] = whiten(fifa['eur_value'])

# Plot the two columns in a scatter plot
fifa.plot(x='scaled_wage', y='scaled_value', kind = 'scatter')
plt.show()

# Check mean and standard deviation of scaled values
print(fifa[['scaled_wage', 'scaled_value']].describe())

<script.py> output:
           scaled_wage  scaled_value
    count      1000.00       1000.00
    mean          1.12          1.31
    std           1.00          1.00
    min           0.00          0.00
    25%           0.47          0.73
    50%           0.85          1.02
    75%           1.41          1.54
    max           9.11          8.98

技术图片

Cluster Analysis in Python

标签:tps   imp   criterion   column   处理   rar   sci   聚合   ring   

原文地址:https://www.cnblogs.com/gaowenxingxing/p/12401364.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!