C2-4.2.2 决策树-纯度+信息熵+信息增益

本文介绍: 【※※※总结】：信息熵是用来衡量给出的数据集中数据的纯度的信息熵越小，数据就越纯。通常用在机器学习分类的情况下3.2 信息熵公式。

决策树算法详解; 算法核心思想; 结构;

其实说白了，就是一个二叉树

我们举一个买黄金的例子吧！黄金有999 和 9999 。他们是有区别的，代表着黄金的纯度（相对杂质而言），那在决策树中——我们也引入了“纯度”这一概念。如果结果集中，全是这一类的，那么我们说“vary pure”。如果结果集中有6个，但是3个是一个类别，那么我们说”not pure”，把除这三个外的东西叫做“杂质”

在这里插入图片描述

那买黄金，有专业的机器来判别我们的黄金的纯度，那在决策树中的结果集中，如何判别纯度呢 / 判别纯度的标准？？——这就引出了**“信息熵”** 的定义。

In Machine Learning, entropy ※※measures the level of disorder or uncertainty in a given dataset or system. It is a metric that quantifies the amount of information in a dataset, and it is commonly used to evaluate the quality of a model and its ability to make accurate predictions.