Python简单线性回归算法实现及应用示例

本文介绍: 是一种使用单个特征预测响应的方法。它是机器学习爱好者了解的最基本的机器学习模型之一。在线性回归中，我们假设两个变量，即因变量和自变量是线性相关的。因此，我们尝试找到一个线性函数，作为特征或自变量 (x) 的函数，尽可能准确地预测响应值 (y)。xy0113223547586879810912为了一般性，我们定义：x 作为特征向量，比如xx−1x−2x−ny 作为响应向量，比如y。

简单线性回归，是一种使用单个特征预测响应的方法。它是机器学习爱好者了解的最基本的机器学习模型之一。在线性回归中，我们假设两个变量，即因变量和自变量是线性相关的。因此，我们尝试找到一个线性函数，作为特征或自变量 (x) 的函数，尽可能准确地预测响应值 (y)。让我们考虑一个数据集，其中每个特征 x 都有一个响应 y 值：

begin{array}{|c|c|c|c|c|c|c|c|c|c|c|} h line math bf{x} & 0 &a mp; 1 &a mp; 2 &a mp; 3 &a mp; 4 &a mp; 5 &a mp; 6 &a mp; 7 &a m p; 8 & 9 \ h line mat h bf{y} & 1 & 3 & 2 & 5 & 7 & 8 & 8 & 9 & 10 & 12 \ h line end{array}

$x y 0113223547586879810912$
为了一般性，我们定义：

x 作为特征向量，比如

[

−

…

−

]

x=left[x_{-} 1, x_{-} 2, ldot s, x_{-} nright]

$x = [x_{-} 1, x_{-} 2, \dots, x_{-} n]$

y 作为响应向量，比如

[

−

…

−

]

y=left[y_{-} 1, y_{-} 2, ldot s, y_{-} nright]

$y = [y_{-} 1, y_{-} 2, \dots, y_{-} n]$

对于 n 个观测值（在上面的示例中，n=10）。上述数据集的散点图如下所示：

现在，任务是在上面的散点图中找到一条最适合的线，以便我们可以预测任何新特征值的响应。（即数据集中不存在 x 的值）这条线称为回归线。回归线的方程表示为：

(

)

hleft(x_iright)=beta_0+beta_1 x_i

$h (x_{i}) = β_{0} + β_{1} x_{i}$

$hleft(x_iright) h(xi)表示第 i 个观测值的预测响应值。$
$beta_0 β0和 β 1 beta_1 β1是回归系数，分别表示回归线的 y 截距和斜率。$

为了创建我们的模型，我们必须“学习”或估计回归系数

beta_0

$β_{0}$ 和

b eta_1

$β_{1}$ 的值。一旦我们估计了这些系数，我们就可以使用该模型来预测响应！

在本文中，我们将使用最小二乘法原理。

(

)

⇒

−

(

)

y_i=b eta_0+beta_1 x_i+v arepsilon_i=hleft(x_iright)+v arepsilon_i Rightar row v arepsilon_i=y_i-hleft(x_iright)

$y_{i} = β_{0} + β_{1} x_{i} + ε_{i} = h (x_{i}) + ε_{i} \Rightarrow ε_{i} = y_{i} - h (x_{i})$
这里，

v arepsilon_i

$ε_{i}$ 是第 i 个观测值的残差。因此，我们的目标是最小化总残差。我们将平方误差或成本函数 J 定义为：

(

)

∑

Jleft(beta_0, beta_1right)=frac{1}{2 n} sum_{i=1}^n v arepsilon_i^2

$J (β_{0}, β_{1}) = \frac{1}{2 n} i = 1 \sum n ε_{i}^{2}$
我们的任务是找到使

(

)

Jleft(beta_0, beta_1right)

$J (β_{0}, β_{1})$ 最小的

beta_0

$β_{0}$ 和

beta_1

$β_{1}$ 的值！不涉及数学细节，我们在这里展示结果：

−

begin{gathered} beta_1=frac{S S_{x y}}{S S_{x: x}} \ beta_0=bar{y}-beta_1 bar{x} end{gathered}

$β_{1} = \frac{S S _{x y}}{S S _{x : x}} β_{0} = \overset{y}{ˉ} - β_{1} \overset{x}{ˉ}$
其中

S S_{x y}

$S S_{x y}$ 是 y 和 x 的交叉偏差之和：

∑

(

−

)

(

−

)

∑

−

S S_{x y}=sum_{i=1}^nleft(x_i-bar{x}right)left(y_i-bar{y}right)=sum_{i=1}^n y_i x_i-n bar{x} bar{y}

$S S_{x y} = i = 1 \sum n (x_{i} - \overset{x}{ˉ}) (y_{i} - \overset{y}{ˉ}) = i = 1 \sum n y_{i} x_{i} - n \overset{x}{ˉ} \overset{y}{ˉ}$

S S_{x x}

$S S_{xx}$ 是 x 的偏差平方和：

∑

(

−

)

∑

−

(

)

S S_{x x}=sum_{i=1}^nleft(x_i-bar{x}right)^2=sum_{i=1}^n x_i^2-n(bar{x})^2

$S S_{xx} = i = 1 \sum n (x_{i} - \overset{x}{ˉ})^{2} = i = 1 \sum n x_{i}^{2} - n (\overset{x}{ˉ})^{2}$
我们可以使用Python语言来学习线性回归模型的系数。为了绘制输入数据和最佳拟合线，我们将使用 matplotlib 库。它是最常用的用于绘制图表的 Python 库之一。

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):
	# number of observations/points
	n = np.size(x)

	# mean of x and y vector
	m_x = np.mean(x)
	m_y = np.mean(y)

	# calculating cross-deviation and deviation about x
	SS_xy = np.sum(y*x) - n*m_y*m_x
	SS_xx = np.sum(x*x) - n*m_x*m_x

	# calculating regression coefficients
	b_1 = SS_xy / SS_xx
	b_0 = m_y - b_1*m_x

	return (b_0, b_1)

def plot_regression_line(x, y, b):
	# plotting the actual points as scatter plot
	plt.scatter(x, y, color = "m",
			marker = "o", s = 30)

	# predicted response vector
	y_pred = b[0] + b[1]*x

	# plotting the regression line
	plt.plot(x, y_pred, color = "g")

	# putting labels
	plt.xlabel('x')
	plt.ylabel('y')

	# function to show plot
	plt.show()

def main():
	# observations / data
	x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
	y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

	# estimating coefficients
	b = estimate_coef(x, y)
	print("Estimated coefficients:nb_0 = {} 
		nb_1 = {}".format(b[0], b[1]))

	# plotting regression line
	plot_regression_line(x, y, b)

if __name__ == "__main__":
	main()

输出：

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

多元线性回归

Python简单线性回归建立电视广播报纸销售额模型

Python简单线性回归分析职场经验和薪水关系

TensorFlow.js 创建 简单线性回归

参阅 – 亚图跨际

原文地址:https://blog.csdn.net/jiyotin/article/details/134696560

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。

如若转载，请注明出处：http://www.7code.cn/show_8307.html

如若内容造成侵权/违法违规/事实不符，请联系代码007邮箱：suwngjj01@126.com进行投诉反馈，一经查实，立即删除！

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

函数线性自变量

多元线性回归

Python简单线性回归建立电视广播报纸销售额模型

Python简单线性回归分析职场经验和薪水关系

TensorFlow.js创建简单线性回归

参阅 – 亚图跨际

相关文章

发表回复 取消回复

TensorFlow.js 创建简单线性回归

发表回复取消回复