线性回归
预测函数:
[
h_theta(x)=theta_0 + theta_1x_1 + theta_2x_2
]
将(x_0 = 1)
[ h_theta(x) = sum_{i=0}^ntheta_ix_i ]
其中 (theta)这参数,n表示特征数量
成本函数:
[
J(theta) = frac{1}{2}sum_{i=1}^m(h_theta(x^{(i)}) - y^{(i)})^2
]
其中m表示样本数量,((x^{(i)},y^{(i)}))表示第i个样本对儿,1/2是为了方便计算
我们目标:
[
min_theta J(theta)
]
最小二乘成本函数的概率学解释(并不是唯一的解释)
假设该线性模型符合高斯分布
[
p(y^{(i)}|x^{(i)};theta) = frac{1}{sqrt{2pi}sigma}expleft(-sqrt{(y^{(i)}-theta^Tx^{(i)^2})}{2sigma^2}right)
]
似然函数为:
[
L(theta) = L(theta;X,vec{y}) = p(vec{y}|X;θ)
]
因此,线性回归的似然函数:
[
begin{align}
L(theta) & = prod_{i=1}^mp(y^{(i)}|x^{(i)};theta) \
& = prod_{i=1}^mfrac{1}{sqrt{2pi}sigma}expleft(-frac{((y^{(i)}) - theta^Tx^{(i)})^2}{2sigma^2}right)
end{align}
]
对数似然度为
[
begin{align}
l(theta) & = log L(theta)\
& = logprod_{i=1}^mfrac{1}{sqrt{2pi}sigma}expleft(-frac{((y^{(i)}) - theta^Tx^{(i)})^2}{2sigma^2}right)\
& = sum_{i=1}^mlogfrac{1}{{2sigma^2}}expleft(-frac{((y^{(i)}) - theta^Tx^{(i)})^2}{2sigma^2}right)\
& = mlogfrac{1}{2pisigma}-frac{1}{sigma^2}cdotfrac{1}{2}sum_{i=1}^m(y^{(i)} - theta^Tx^{(i)})^2
end{align}
]
最大化(l(theta))相当于最小化,也就是成本函数
[
frac{1}{2}sum_{i=1}^m(y^{(i)} - theta^Tx^{(i)})^2
]
梯度下降
步骤
- 初始化(theta)
- 改变(theta)使(J(theta))降低
- 终止
每一步使:
[
theta_i:=theta_i-alphafrac{partial}{partialtheta_i}J(theta)
]
(theta_i)表示第个参数,:=表示赋值符号,(alpha)表示步长也叫学习速率,(frac{partial}{partialtheta_i})是对(theta_i)求偏导数
对于单个样本:
[ begin{align} frac{partial}{partialtheta_i}J(theta) & = frac{partial}{partialtheta_i}frac{1}{2}(h_theta(x) - y)^2 \ & = 2cdotfrac{1}{2}(h_theta(x) - y) cdot frac{partial}{partialtheta_i}(h_theta(x)-y) \ & = (h_theta(x) - y)cdotfrac{partial}{partialtheta_i}(theta_0x_0+theta_1x_1+... + theta_nx_n-y) \ & = (h_theta(x) - y)cdot x_i end{align} ]
(1)-(2) 根据链式求导法则,(3)-(4)((theta_0x_0+theta_1x_1+... + theta_nx_n-y))只有(theta_i)起作用
因此
[ theta_i:=theta_i-alpha(h_theta(x) - y)cdot x_i ]
对于多个样本:
[ theta_i:=theta_i-sum_{j=1}^malpha(h_theta(x^{(j)}) - y^{(j)})cdot x_i^{(j)} ]
批梯度下降算法代码
public static void main(String[] args) {
//训练集合输入值
double x[][] = {
{1, 4},
{2, 5},
{5, 1},
{4, 2}};
//训练集合期望输出值
double y[] = {
19,
26,
19,
20};
//参数
double theta[] = {0, 0};
//步长
double learningRate = 0.01;
//样本集合长度
int m = x.length;
//迭代次数
int iteration = 100;
for (int index = 0; index < iteration; index++) {
double err_sum = 0;
double[] tmp = new double[theta.length];
System.arraycopy(theta, 0, tmp, 0, theta.length);
//i->m
for (int j = 0; j < m; j++) {
double h = 0;
for (int i = 0; i < theta.length; i++) {//x[j][i]表示第j个样本的第i个特征
h += x[j][i] * theta[i];
}
err_sum = h - y[j];
for (int i = 0; i < theta.length; i++) {
tmp[i] -= learningRate * err_sum * x[j][i];
}
}
//批量更新参数
theta = tmp;
}
for (int i = 0; i < theta.length; i++) {
System.out.println("theta" + i + "=" + theta[i]);
}
}
- 还没有人评论,欢迎说说您的想法!