环球资讯:【案例分享】降维案例探究

发布时间:   来源:CSDN  

降维案例


(相关资料图)

案例一步骤步骤一步骤二步骤三

案例一

探究: 用户对物品类别的喜好细分降维.

数据:

products.csv 商品信息order_products__prior.csv 订单与商品信息orders.csv 用户的订单信息aisles.csv 商品所属具体物品类别

步骤

合并各张表到一张表当中: pd.merge()建立一个类似行, 列数据使用 PCA 分析

步骤一

import pandas as pdfrom sklearn.decomposition import PCA# 读取四张表的数据prior = pd.read_csv("order_products__prior.csv")products = pd.read_csv("products.csv")orders = pd.read_csv("orders.csv")aisles = pd.read_csv("aisles.csv")# 合并四张表到一张表_mg = pd.merge(prior, products, on=["product_id", "product_id"])_mg = pd.merge(_mg, orders, on=["order_id", "order_id"])mt = pd.merge(_mg, aisles, on=["aisle_id", "aisle_id"])print(mt.head())输出结果:0         2       33120  ...                     8.0   eggs1        26       33120  ...                     7.0   eggs2       120       33120  ...                    10.0   eggs3       327       33120  ...                     8.0   eggs4       390       33120  ...                     9.0   eggs

步骤二

import pandas as pdfrom sklearn.decomposition import PCA# 读取四张表的数据prior = pd.read_csv("order_products__prior.csv")products = pd.read_csv("products.csv")orders = pd.read_csv("orders.csv")aisles = pd.read_csv("aisles.csv")# 合并四张表到一张表_mg = pd.merge(prior, products, on=["product_id", "product_id"])_mg = pd.merge(_mg, orders, on=["order_id", "order_id"])mt = pd.merge(_mg, aisles, on=["aisle_id", "aisle_id"])# 交叉表 (特殊的分组工具)cross = pd.crosstab(mt["user_id"],mt["aisle"])# 输出头5条数据print(cross.head())输出结果:aisle    air fresheners candles  asian foods  ...  white wines  yogurtuser_id                                       ...                     1                             0            0  ...            0       12                             0            3  ...            0      423                             0            0  ...            0       04                             0            0  ...            0       05                             0            2  ...            0       3

步骤三

import pandas as pdfrom sklearn.decomposition import PCA# 读取四张表的数据prior = pd.read_csv("order_products__prior.csv")products = pd.read_csv("products.csv")orders = pd.read_csv("orders.csv")aisles = pd.read_csv("aisles.csv")# 合并四张表到一张表_mg = pd.merge(prior, products, on=["product_id", "product_id"])_mg = pd.merge(_mg, orders, on=["order_id", "order_id"])mt = pd.merge(_mg, aisles, on=["aisle_id", "aisle_id"])# 交叉表 (特殊的分组工具)cross = pd.crosstab(mt["user_id"], mt["aisle"])# 进行主成分分析pca = PCA(n_components=0.9)data = pca.fit_transform(cross)# 输出数据print(data)输出结果:[[-2.42156587e+01  2.42942720e+00 -2.46636975e+00 ...  6.86800336e-01   1.69439402e+00 -2.34323022e+00] [ 6.46320806e+00  3.67511165e+01  8.38255336e+00 ...  4.12121252e+00   2.44689740e+00 -4.28348478e+00] [-7.99030162e+00  2.40438257e+00 -1.10300641e+01 ...  1.77534453e+00  -4.44194030e-01  7.86665571e-01] ... [ 8.61143331e+00  7.70129866e+00  7.95240226e+00 ... -2.74252456e+00   1.07112531e+00 -6.31925661e-02] [ 8.40862199e+01  2.04187340e+01  8.05410372e+00 ...  7.27554259e-01   3.51339470e+00 -1.79079914e+01] [-1.39534562e+01  6.64621821e+00 -5.23030367e+00 ...  8.25329076e-01   1.38230701e+00 -2.41942061e+00]]

查看 data.shape, 我们可以发现 类别由 134 个变为了 27 个.

相关文章Related

返回栏目>>