本文共 778 字,大约阅读时间需要 2 分钟。
diabetes_data_upload.csv
使用了sklearn库中决策树tree.DecisionTreeClassifier()函数
默认是使用gini index作为impurity measures
import pandas as pdfrom sklearn import treedf = pd.read_csv("diabetes_data_upload.csv")X = pd.get_dummies(df.drop(columns="class")) #Convert categorical attributes to binary attributes using get_dummies()y = df["class"]dtc = tree.DecisionTreeClassifier().fit(X, y)print(tree.export_text(dtc, feature_names=X.columns.tolist()))
可视化决策树
import pandas as pdfrom sklearn import treeimport graphvizdf = pd.read_csv("diabetes_data_upload.csv")X = pd.get_dummies(df.drop(columns="class"))y = df["class"]dtc = tree.DecisionTreeClassifier().fit(X, y)dot_data = tree.export_graphviz(dtc, out_file=None)graph = graphviz.Source(dot_data)graph.render("diabetes") #Generate diabetes.pdf
转载地址:http://oaygf.baihongyu.com/