降维案例
(相关资料图)
案例一步骤步骤一步骤二步骤三
案例一
探究: 用户对物品类别的喜好细分降维.
数据:
products.csv 商品信息order_products__prior.csv 订单与商品信息orders.csv 用户的订单信息aisles.csv 商品所属具体物品类别
步骤
合并各张表到一张表当中: pd.merge()建立一个类似行, 列数据使用 PCA 分析
步骤一
import pandas as pdfrom sklearn.decomposition import PCA# 读取四张表的数据prior = pd.read_csv("order_products__prior.csv")products = pd.read_csv("products.csv")orders = pd.read_csv("orders.csv")aisles = pd.read_csv("aisles.csv")# 合并四张表到一张表_mg = pd.merge(prior, products, on=["product_id", "product_id"])_mg = pd.merge(_mg, orders, on=["order_id", "order_id"])mt = pd.merge(_mg, aisles, on=["aisle_id", "aisle_id"])print(mt.head())输出结果:0 2 33120 ... 8.0 eggs1 26 33120 ... 7.0 eggs2 120 33120 ... 10.0 eggs3 327 33120 ... 8.0 eggs4 390 33120 ... 9.0 eggs
步骤二
import pandas as pdfrom sklearn.decomposition import PCA# 读取四张表的数据prior = pd.read_csv("order_products__prior.csv")products = pd.read_csv("products.csv")orders = pd.read_csv("orders.csv")aisles = pd.read_csv("aisles.csv")# 合并四张表到一张表_mg = pd.merge(prior, products, on=["product_id", "product_id"])_mg = pd.merge(_mg, orders, on=["order_id", "order_id"])mt = pd.merge(_mg, aisles, on=["aisle_id", "aisle_id"])# 交叉表 (特殊的分组工具)cross = pd.crosstab(mt["user_id"],mt["aisle"])# 输出头5条数据print(cross.head())输出结果:aisle air fresheners candles asian foods ... white wines yogurtuser_id ... 1 0 0 ... 0 12 0 3 ... 0 423 0 0 ... 0 04 0 0 ... 0 05 0 2 ... 0 3
步骤三
import pandas as pdfrom sklearn.decomposition import PCA# 读取四张表的数据prior = pd.read_csv("order_products__prior.csv")products = pd.read_csv("products.csv")orders = pd.read_csv("orders.csv")aisles = pd.read_csv("aisles.csv")# 合并四张表到一张表_mg = pd.merge(prior, products, on=["product_id", "product_id"])_mg = pd.merge(_mg, orders, on=["order_id", "order_id"])mt = pd.merge(_mg, aisles, on=["aisle_id", "aisle_id"])# 交叉表 (特殊的分组工具)cross = pd.crosstab(mt["user_id"], mt["aisle"])# 进行主成分分析pca = PCA(n_components=0.9)data = pca.fit_transform(cross)# 输出数据print(data)输出结果:[[-2.42156587e+01 2.42942720e+00 -2.46636975e+00 ... 6.86800336e-01 1.69439402e+00 -2.34323022e+00] [ 6.46320806e+00 3.67511165e+01 8.38255336e+00 ... 4.12121252e+00 2.44689740e+00 -4.28348478e+00] [-7.99030162e+00 2.40438257e+00 -1.10300641e+01 ... 1.77534453e+00 -4.44194030e-01 7.86665571e-01] ... [ 8.61143331e+00 7.70129866e+00 7.95240226e+00 ... -2.74252456e+00 1.07112531e+00 -6.31925661e-02] [ 8.40862199e+01 2.04187340e+01 8.05410372e+00 ... 7.27554259e-01 3.51339470e+00 -1.79079914e+01] [-1.39534562e+01 6.64621821e+00 -5.23030367e+00 ... 8.25329076e-01 1.38230701e+00 -2.41942061e+00]]
查看 data.shape, 我们可以发现 类别由 134 个变为了 27 个.