从头训练一个简单的线性回归模型

Training a Simple Linear Regression Model From Scratch

Posted by xuepro on May 8, 2018

问题和数据集 Problem & Dataset

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
...


import numpy as np
import matplotlib.pyplot as plt

x = data[:, 0] # city populations
y = data[:, 1] # food truck profits


x,y都是一维向量，分别表示特征(feature )和目标变量( target variable)

fig, ax = plt.subplots()
ax.scatter(x, y, marker="x", c="red")
plt.title("Food Truck Dataset", fontsize=16)
plt.xlabel("City Population in 10,000s", fontsize=14)
plt.ylabel("Food Truck Profit in 10,000s", fontsize=14)
plt.axis([4, 25, -5, 25])
plt.show()


假设函数 Hypothesis Function

$h_\theta(x) = \theta_0+\theta_1x_1+\theta_2x_2+ ...+\theta_nx_n = (\theta_0,\theta_1,...\theta_n)(x_0,x_1,x_2,...x_n)^T \theta^Tx$

$h_\theta(x) = \theta_0+\theta_1x_1$

theta = np.zeros(2)

X = np.ones(shape=(len(x), 2))
X[:, 1] = x

predictions = X @ theta
print(predictions)

    [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
0.  0.  0.  0.  0.  0.  0.]


$J(\theta) = \frac{1}{2m}\sum_{n=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2$

def cost(theta, X, y):
predictions = X @ theta
squared_errors = np.square(predictions - y)
return np.sum(squared_errors) / (2 * len(y))

print('The initial cost is:', cost(theta, X, y))

  The initial cost is: 32.0727338775


$\theta_j: = \theta_j -\alpha\frac{1}{m}\sum_{n=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
def gradient_descent(X, y, alpha, num_iters):
num_features = X.shape[1]
theta = np.zeros(num_features)          # initialize model parameters
for n in range(num_iters):
predictions = X @ theta             # compute predictions based on the current hypothesis
errors = predictions - y
theta -= alpha * gradient / len(y)  # update model parameters
return theta                            # return optimized parameters


theta = gradient_descent(X, y, 0.02, 600)   # run GD for 600 iterations with learning rate = 0.02
predictions = X @ theta                     # predictions made by the optimized model
ax.plot(X[:, 1], predictions, linewidth=2)  # plot the hypothesis on top of the training data
fig


调试Debugging

1.  修改梯度下降法在每次迭代结束后记录cost值（Modify the gradient descent function to make it record the cost at the end of each iteration.）
2. 当梯度下降法完成后绘制“代价历史(cost history)”（Plot the cost history after the gradient descent has finished.）
3. 躺在你的座位上看看"代价cost"是否随着时间在下降(Pat yourself on the back if you see that the cost has monotonically decreased over time.)


def gradient_descent(X, y, alpha, num_iters):
cost_history = np.zeros(num_iters)          # create a vector to store the cost history
num_features = X.shape[1]
theta = np.zeros(num_features)
for n in range(num_iters):
predictions = X @ theta
errors = predictions - y
theta -= alpha * gradient / len(y)
cost_history[n] = cost(theta, X, y)     # compute and record the cost
return theta, cost_history                  # return optimized parameters and cost history



plt.figure()
num_iters = 1200
learning_rates = [0.01, 0.015, 0.02]
for lr in learning_rates:
_, cost_history = gradient_descent(X, y, lr, num_iters)
plt.plot(cost_history, linewidth=2)
plt.title("Gradient descent with different learning rates", fontsize=16)
plt.xlabel("number of iterations", fontsize=14)
plt.ylabel("cost", fontsize=14)
plt.legend(list(map(str, learning_rates)))
plt.axis([0, num_iters, 4, 6])
plt.grid()
plt.show()


learning_rate = 0.025
num_iters = 50
_, cost_history = gradient_descent(X, y, learning_rate, num_iters)
plt.plot(cost_history, linewidth=2)
plt.title("Gradient descent with learning rate = " + str(learning_rate), fontsize=16)
plt.xlabel("number of iterations", fontsize=14)
plt.ylabel("cost", fontsize=14)
plt.axis([0, num_iters, 0, 6000])
plt.grid()
plt.show()


Prediction预测

theta, _ = gradient_descent(X, y, 0.02, 600)    # train the model训练模型
test_example = np.array([1, 7])                 # pick a city with 70,000 population as a test example
# 选择7万人的城市作为一个测试例子
prediction = test_example @ theta               # use the trained model to make a prediction  预测结果
print('For population = 70,000, we predict a profit of $', prediction * 10000);  For population = 70,000, we predict a profit of$ 45905.6621788