RANSAC

RANSAC为Random Sample Consensus,即随机采样一致性算法,是根据一组包含异常数据的样本数据集,计算出数据的数学模型参数,得到有效样本数据的算法。本文主要从线性拟合角度分析。

RANSAC算法_拟合


对于这个数据样本而言,由于噪声偏离正确数据不是太远且噪声少,拟合结果偏差不是太大。但是当噪声比例或偏离很大时,基于全局的最小二乘法几乎无法得到好的结果。

RANSAC思想

RANSAC算法的基本假设是样本中包含正确数据(inliers,可以被模型描述的数据),也包含异常数据(outliers,偏离正常范围很远、无法适应数学模型的数据),即数据集中含有噪声。 主要思想是通过不断的从样本中随机选择一定的样本来拟合模型,然后用未被选中的样本测试模型,根据一定的规则保留最优模型。

RANSAC算法流程

  1. 随机选择n个样本,作为inliners;
  2. 用inliners拟合一个模型(本文做线性拟合,采用最小二乘),然后用模型测试outliners,如果某个outliners与模型的误差小于给定的阈值,将其加入inliners;
  3. 如果inliners中样本个数大于设定的值,得到认为正确的模型。然后用新的inliners重新估计模型;
  4. 执行以上过程指定的轮数,每次产生的模型要么因为inliners太少而被舍弃,要么比现有的模型更好而被选中。

RANSAC线性拟合实验

import numpy as np
import matplotlib.pyplot as plt
import operator as op

class Ransac:
weight = 0
bias = 0

#最小二乘法拟合
def least_square(self, samples):
X = np.zeros([samples.shape[0], 2], dtype=float)
X[:, 0] = samples[:, 0]
X[:, 1] = 1
Y = samples[:, 1]
A = np.dot(X.T, X)
B = np.linalg.pinv(A)
C = np.dot(B, X.T)
Theta = np.dot(C, Y)
A = np.linalg.pinv(np.dot(X.T, X))
B = np.dot(A, X.T)
Theta = np.dot(B, Y)
weight = Theta[0]
bias = Theta[1]
return weight, bias

#判断是否有重复样本
def isRepeat(self, src, dst):
for i in range(len(src)):
if op.eq(list(src[i]), list(dst)):
return True
return False

#随机采样
def random_sample(self, samples, point_ratio):
num = len(samples)
inliners_num = int(num * point_ratio)
inliners = []
outliners = []
cur_num = 0;
while cur_num != inliners_num:
index = np.random.randint(0, num)
sample_cur = samples[index]
if not self.isRepeat(inliners, sample_cur):
cur_num += 1
inliners.append(list(sample_cur))

for i in range(num):
if not self.isRepeat(inliners, samples[i]):
outliners.append(list(samples[i]))
return np.array(inliners), np.array(outliners)

def fun_plot(self, samples, w, b):
data_x = np.linspace(0, 50, 50)
data_y = [w * x + b for x in data_x]
plt.ion()
plt.plot(samples[:, 0], samples[:,1], 'bo')
plt.plot(data_x, data_y, 'r')
plt.show()
plt.pause(0.05)
plt.clf()

def ransac(self, samples, point_ratio = 0.05, epoch = 50, reject_dis = 5, inliners_ratio = 0.4):
# samples 输入样本
# point_ratio 随机选择样本的比例
# epoch 迭代次数
# reject_dis 小于此阈值将outliners加入inliners
# inliners_ratio 有效inliners最低比例
max_inlinear_num = 0;
for i in range(epoch):
inliners, outliners = self.random_sample(samples, point_ratio)
w, b = self.least_square(inliners)
#self.fun_plot(samples, w, b)
for j in range(len(outliners)):
distance = np.abs(w * outliners[j,0] + b - outliners[j,1]) / np.sqrt(np.power(w,2) + 1)
if distance <= reject_dis:
inliners = np.vstack((inliners, outliners[j]))
w, b = self.least_square(inliners)
self.fun_plot(inliners, w, b)
if len(inliners) >= len(samples) * inliners_ratio:
if len(inliners) > max_inlinear_num:
self.weight = w
self.bias = b
max_inlinear_num = len(inliners)
plt.ioff()
plt.close()
import numpy as np
import matplotlib.pyplot as plt
import utils

sample = np.loadtxt('sample.txt')
Test = utils.Ransac()

#最小二乘法拟合结果
k, b = Test.least_square(sample)
data_x = np.linspace(-10, 50, 60)
data_y = [k * x + b for x in data_x]
plt.figure(1)
plt.plot(sample[:, 0], sample[:, 1], 'bo')
plt.plot(data_x, data_y, 'r')
plt.show()

#RANSAC拟合结果
Test.ransac(sample, 0.1, 10, 5, 0.4)
print([Test.weight, Test.bias])
data_x = np.linspace(0, 50, 50)
data_y = [Test.weight * x + Test.bias for x in data_x]
plt.figure(3)
plt.plot(sample[:, 0], sample[:, 1], 'bo')
plt.plot(data_x, data_y, 'r')
plt.show()

部分中间迭代结果:

RANSAC算法_计算机视觉_02

RANSAC算法_拟合_03

 

RANSAC算法_算法_04

最佳迭代结果:

RANSAC算法_算法_05

 

sample.txt样本数据:

-10 25
-9 24.2
-8 28.4
-7 23.8
-6 26.4
-5 28.3
-4 29.3
-3 26.9
-2 27.2
-1 28.9
0 26.4
1 7.751236167
2 10.20045434
3 10.81046402
4 11.13503451
5 12.9903666
6 16.7852859
7 21.14537052
8 24.17658814
9 25.20666067
10 25.40315356
11 26.70002155
12 30.01953958
13 34.42436748
14 37.97933618
15 39.53063325
16 39.7666127
17 40.58492552
18 43.34782806
19 47.62972986
20 51.60847955
21 53.7406424
22 54.18052712
23 54.63831511
24 56.8077276
25 60.80882615
26 65.07762859
27 67.80402704
28 68.59599273
29 68.84000545
30 70.42633043
31 74.01111718
32 78.4131387
33 81.69980609
34 82.96398191
35 83.15799813
36 53.81808652
37 56.58421611
38 51.65201087
39 55.12034985
40 56.63924895
41 97.55103013
42 98.18365259
43 100.6700956
44 104.8389442
45 108.9719878
46 111.3839344
47 111.9718609
48 112.3098397
49 114.2017442
50 118.0227753

 参考:​​线性拟合笔记之:Ransac算法​​