吴恩达机器学习笔记

Introduction

ICA, FastICA, 鸡尾酒会问题

线性回归

梯度下降

Normal Equation

Y=XWY = XW XTY=XTXWX^TY = X^TXW W=(XTX)1XTYW = (X^TX)^{-1}X^TYXTXX^TX不可逆,则求它的伪逆

Logistic

在特征特别多的时候,例如一张图片,要拟合非线性情况,需要将多个特征组合,将产生百万级的参数,计算成本太高。

其他优化

牛顿法

hessian矩阵的计算: https://zhuanlan.zhihu.com/p/63305895

过拟合

神经网络

神经网络的损失函数一般都是非凸的,如何证明?

反向传播推导

loss function: J(θ)=1mi=1mk=1K[yk(i)log(ak(i))(1yk(i)log(1ak(i)]J(\theta) = \frac{1}{m}\sum_{i=1}^m\sum_{k=1}^K[-y_k^{(i)}log(a_k^{(i)}) - (1 - y_k^{(i)}log(1 - a_k^{(i)}]

where a=sigmoid(z)=g(z)=11+eza = sigmoid(z) = g(z) = \frac{1}{1 + e^{-z}}

we know that g(z)=g(z)(1g(z))g^{'}(z) = g(z)(1 - g(z))

where z(l+1)=(θ(l))Ta(l)=(θ(l))Tg(z(l))z^{(l + 1)} = (\theta^{(l)})^Ta^{(l)} = (\theta^{(l)})^Tg(z^{(l)})

So, for the last layer, let δ(l+1)=Jz(l+1)=Ja(l+1)a(l+1)z(l+1)=(y(l+1)a(l+1)+1y(l+1)1a(l+1))(a(l+1)(1a(l+1)))=a(l+1)y(l+1)\delta^{(l + 1)} = \frac{\partial J}{\partial z^{(l + 1)}} = \frac{\partial J}{\partial a^{(l + 1)}}\cdot \frac{\partial a^{(l + 1)}}{\partial z^{(l + 1)}} = (-\frac{y^{(l + 1)}}{a^{(l + 1)}} + \frac{1 - y^{(l + 1)}}{1 - a^{(l + 1)}}) \cdot (a^{(l + 1)}(1 - a^{(l + 1)})) = a^{(l + 1)} - y^{(l + 1)}

then, Jθ(l)=Jz(l+1)z(l+1)a(l)=(a(l+1)y(l+1))a(l)\frac{\partial J}{\partial \theta^{(l)}} = \frac{\partial J}{\partial z^{(l + 1)}} \cdot \frac{\partial z^{(l + 1)}}{\partial a^{(l)}} = (a^{(l + 1)} - y^{(l + 1)})a^{(l)}

for previous layer, δ(l)=Jz(l)=Jz(l+1)z(l+1)z(l)=δ(l+1)(θ(l))Tg(z(l))\delta^{(l)} = \frac{\partial J}{\partial z^{(l)}} = \frac{\partial J}{\partial z^{(l + 1)}} \cdot \frac{\partial z^{(l + 1)}}{\partial z^{(l)}} = \delta^{(l + 1)}(\theta^{(l)})^Tg^{'}(z^{(l)})

then, Jθ(l1)=Jz(l)z(l)a(l1)=δ(l)a(l1)\frac{\partial J}{\partial \theta^{(l - 1)}} = \frac{\partial J}{\partial z^{(l)}} \cdot \frac{\partial z^{(l)}}{\partial a^{(l - 1)}} = \delta^{(l)}a^{(l - 1)}

梯度检测

gradientJ(θ+ϵ)J(θϵ)2ϵgradient \approx \frac{J(\theta + \epsilon) - J(\theta - \epsilon)}{2\epsilon}

随机初始化

模型评估

精确度Precision和召回率Recall

Predicted \ Actual 1 0
1 True positive False positive
0 False negative True negative

precision=True positiveTrue positive+False positiveprecision = \frac{True\ positive}{True\ positive + False\ positive}

sensitivity = recall

balanced accuracy=sensitivity+specificity2balanced\ accuracy = \frac{sensitivity + specificity}{2}

支持向量机

Mercer 定理:任何半正定的函数都可以作为核函数

对于任意不为0的向量xx,有xTAx>=0x^TAx >= 0, 则AA是半正定矩阵。

Kmeans

PCA

奇异值分解的过程:

A=UΣVTA = U \Sigma V^T

变换

Z=XUZ = XU

Reconstruction

从k维转回n维 Z=XUZ = XU Xapprox=ZUT=XUUTX_approx = ZU^T = XUU^T

异常检测

推荐系统

在线学习

机器学习pipeline