逻辑回归是个二分类问题,而不是回归问题。
一般对数据集先用逻辑回归(最简单的分类)然后再用普通分类算法。

基础公式:y(i)=θTx(i)+Ei y ( i ) = θ T x ( i ) + E i (1)
E E 是每个样本的误差值,并且服从独立同分布。

均值为0方差为θ2 θ 2 的高斯分布。也就是说满足如下:
p(E(i))=12π√σexp(−(ϵ(i))22σ2) p ( E ( i ) ) = 1 2 π σ e x p ( − ( ϵ ( i ) ) 2 2 σ 2 ) (2)

将(1)代入(2)式:
p(y(i)|x(i);θ)=12π√σexp(−(y(i)−θTx(i))22σ2) p ( y ( i ) | x ( i ) ; θ ) = 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 )

似然函数:
L(θ)=∏mi=1p(yi|x(i);θ)=∏mi=112π√σexp(−(y(i)−θTx(i))22σ2) L ( θ ) = ∏ i = 1 m p ( y i | x ( i ) ; θ ) = ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 )
在似然函数前取对数,累乘变累加:
logL(θ)=∏mi=1p(yi|x(i);θ)=log∏mi=112π√σexp(−(y(i)−θTx(i))22σ2) l o g L ( θ ) = ∏ i = 1 m p ( y i | x ( i ) ; θ ) = l o g ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 )
上述原式=∑mi=1log12π√σexp(−(y(i)−θTx(i))22σ2) ∑ i = 1 m l o g 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) =mlog12π√σ−1σ2⋅12∑mi=1(yi−θTx(i))2 m l o g 1 2 π σ − 1 σ 2 · 1 2 ∑ i = 1 m ( y i − θ T x ( i ) ) 2
对上述简化式常数去掉取局部计算极值:
J(θ)=12∑mi=1(y(i)−θTx(i))2 J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 (最小二乘法)
按照矩阵求导法则计算该式偏导最后得到:
J(θ)=XTXθ−XTy J ( θ ) = X T X θ − X T y
令J(θ)=0→ J ( θ ) = 0 → θ=(XTX)−1XTy θ = ( X T X ) − 1 X T y
在此处有一个问题:我们不知道X的逆是否存在
故而换方法:

梯度下降

目标函数:J(θ)=12m∑mi=1(y(i)−hθ(x(i)))2 J ( θ ) = 1 2 m ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) 2
有三种策略来进行梯度下降算法:
1、批量梯度下降——精度很高,时间太慢
2、随机梯度下降:速度非常快,但是不一定往最优方向迭代
3、小批量梯度下降算法:折中,就选你了! 公式:θ′j=θj−α110∑k=ii+9(hθ(xk)−y(k))xkj θ j ′ = θ j − α 1 10 ∑ i + 9 k = i ( h θ ( x k ) − y ( k ) ) x j k

学习率(步长):一般选小一点,步子买大了容易造成局部最优而不是全局最优,步子迈太小了容易造成效率太低速度慢(通常选16,32,64,128【程序员都喜欢这些数字,别问为什么】)。

sigmoid函数
g(z)=11+e−1 g ( z ) = 1 1 + e − 1
预测函数:hθ(x)=g(θTx)=11+e−θTx h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x
其中:θTx=∑ni=1θixi=θ0+θ1x1=,......,θnxn θ T x = ∑ i = 1 n θ i x i = θ 0 + θ 1 x 1 = , . . . . . . , θ n x n
分类任务(整合概率分布)=p(y|x;θ)=hθ(x)y(1−hθ(x)1−y p ( y | x ; θ ) = h θ ( x ) y ( 1 − h θ ( x ) 1 − y
当y=0时保留(1−hθ(x)1−y ( 1 − h θ ( x ) 1 − y
当y=1时保留p(y|x;θ)=hθ(x))y p ( y | x ; θ ) = h θ ( x ) ) y
似然函数:L(θ)=∏mi=1p(yi|x(i);θ)=∏mi=1hθ(x(i))yi(1−hθ(xi))1−yi L ( θ ) = ∏ i = 1 m p ( y i | x ( i ) ; θ ) = ∏ i = 1 m h θ ( x ( i ) ) y i ( 1 − h θ ( x i ) ) 1 − y i
同上个方法一样取对数化简:
l(θ)=logL(θ)=∑mi=1(yiloghθ(xθ)+(1−yi)log(1−hθ(xi))) l ( θ ) = l o g L ( θ ) = ∑ i = 1 m ( y i l o g h θ ( x θ ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) )
上式经过一系列推导后可得:−1m∑mi=1(hθ(xi)−yi)xji − 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j (此处的i,j为第j个特征的0-i个样本值)
参数θ θ 的更新:θ′j=θj−α1m∑mi=1(hθ(xi)−yi)xji θ j ′ = θ j − α 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i j

代码实现:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time


path = 'E:/test/tidu/data/LogiReg_data.txt'
pddata=pd.read_csv(path,header=None,names=['Exam 1','Exam 2','Admitted'])
positive=pddata[pddata['Admitted']==1]
negative=pddata[pddata['Admitted']==0]
fig,ax = plt.subplots(figsize=(10,5))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=30, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=30, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
pddata.insert(0,'ones',1)
orig_data=pddata.as_matrix()
cols = orig_data.shape[1]
X = orig_data[:,0:cols-1]
y = orig_data[:,cols-1:cols]
theta=np.zeros([1,3])
STOP_ITER=0
STOP_COST=1
STOP_GRAD=2

def sigmoid(z):   #构建sigmoid函数
    return 1/(1+np.exp(-z))


def model(X, theta):
    return sigmoid(np.dot(X, theta.T))





def gradient(X, y, theta):
    grad = np.zeros(theta.shape)
    error = (model(X, theta)- y).ravel()
    for j in range(len(theta.ravel())): #for each parmeter
        term = np.multiply(error, X[:,j])
        grad[0, j] = np.sum(term) / len(X)

    return grad

def cost(X, y, theta):
    left = np.multiply(-y, np.log(model(X, theta)))
    right = np.multiply(1 - y, np.log(1 - model(X, theta)))
    return np.sum(left - right) / (len(X))


def stopcriterion(type, value, threshold):
    #设定三种不同的停止策略
    if type == STOP_ITER:        return value > threshold
    elif type == STOP_COST:      return abs(value[-1]-value[-2]) < threshold
    elif type == STOP_GRAD:      return np.linalg.norm(value) < threshold

def shuffleData(data):
        np.random.shuffle(data)
        cols=data.shape[1]
        X=data[:,0:cols-1]
        y=data[:,cols-1:]
        return X,y


def descent(data, theta, batchSize, stopType, thresh, alpha):
    #梯度下降求解

    init_time = time.time()
    i = 0 # 迭代次数
    k = 0 # batch
    X, y = shuffleData(data)
    grad = np.zeros(theta.shape) # 计算的梯度
    costs = [cost(X, y, theta)] # 损失值

    while True:
        grad = gradient(X[k:k+batchSize], y[k:k+batchSize], theta)
        k += batchSize #取batch数量个数据
        if k >= n: 
            k = 0 
            X, y = shuffleData(data) #重新洗牌
        theta = theta - alpha*grad # 参数更新
        costs.append(cost(X, y, theta)) # 计算新的损失
        i += 1 

        if stopType == STOP_ITER:       value = i
        elif stopType == STOP_COST:     value = costs
        elif stopType == STOP_GRAD:     value = grad
        if stopcriterion(stopType, value, thresh): break

    return theta, i-1, costs, grad, time.time() - init_time


def runExpe(data, theta, batchSize, stopType, thresh, alpha):
    #import pdb; pdb.set_trace();
    theta, iter, costs, grad, dur = descent(data, theta, batchSize, stopType, thresh, alpha)
    name = "Original" if (data[:,1]>2).sum() > 1 else "Scaled"
    name += " data - learning rate: {} - ".format(alpha)
    if batchSize==n: strDescType = "Gradient"
    elif batchSize==1:  strDescType = "Stochastic"
    else: strDescType = "Mini-batch ({})".format(batchSize)
    name += strDescType + " descent - Stop: "
    if stopType == STOP_ITER: strStop = "{} iterations".format(thresh)
    elif stopType == STOP_COST: strStop = "costs change < {}".format(thresh)
    else: strStop = "gradient norm < {}".format(thresh)
    name += strStop
    print ("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format(
        name, theta, iter, costs[-1], dur))
    fig, ax = plt.subplots(figsize=(12,4))
    ax.plot(np.arange(len(costs)), costs, 'r')
    ax.set_xlabel('Iterations')
    ax.set_ylabel('Cost')
    ax.set_title(name.upper() + ' - Error vs. Iteration')
    return theta


n=100
runExpe(orig_data, theta, n, STOP_COST, thresh=0.000001, alpha=0.001)