今天看了有关支持向量机(Support vector machine,简称SVM )用来分类的内容。通过学习算法,SVM可以自动找出那些对分类有用有较好区分能力的支持向量,由此构造出的分类器可以最大化类与类的距离,因此有较高的适应能力和较高的分辨率。SVM属于有监督(即设定了训练样本,无监督是指实现未设定训练样本)的学习方法。
基本思想:
通过与分类器平行的两个界面,能够很好的分开两类不同的数据,在寻找最佳超几何平面使之与两个平面间的距离最大,如此便能实现分类总误差最小,即“最小化泛化误差”。
SVM划分算法示意图
- SVM算法的优缺点:
优点:
该方法只需由各类域的边界样本的类别来决定最后的分类结果。常在文本分类、文字识别、图像分类、升序序列分类等方面被广泛应用。
SVM学习问题可以表示为凸优化(即要求目标函数是凸函数,变量所属集合是凸集合的优化问题)问题。
缺点:
(1)确定高维空间的映射(核函数)方法存在缺陷,SVM将高维空间的复杂性转化成了求和函数的复杂性
(2)在进行求解函数的二次规划时,需要消耗大量的存储空间。
对于线性SVM,引入松弛变量,将其转化为纯线性规划问题,便于用MATLAB进行编程实现。
- MATLAB实例:
分别实现了对于判别城市消费水平和乳腺癌诊断的问题,在做乳腺癌诊断时,误差率问题卡壳了一个晚上,始终保持在0.014水平,无法减小到0。结果连SVM的基本原理都没有弄清,svmtrain(训练数据,分类组别,‘Method’,‘具体的方法’,‘Kernel_Function’,‘训练方式)函数的训练方式主要有:“linear”(线性,默认属性)、“quadratic”(二次方)、“polynomial”(多项式)、“rbf”(Radial Basis Function,径向基函数)、“ mlp”( Multilayer Perceptron kernel ,简称MLP,多层感知器内核)这五种方式。
在进行判别城市消费水平时,我仅仅填写了svmtrain函数的前两项参数,得出的误差率结果是0;于是将其套用在乳腺癌诊断,但误差率始终无法为0。最后再返回看原理才知道是训练方式的问题。简单的说,就是SVM是将同一类事物同一类指标的数据,根据不同的指数水平用一条线(约束条件)分隔开,有的分隔很清晰(线性即可,比如城市消费水平),有的分隔不清晰(分隔区还有不同组别的数据在一起,因此需要引入惩罚函数,进行二次规划等,类似乳腺癌诊断)。
不过,这就是没把原理看懂的后遗症啊。当遇到问题时,还需要倒推原理,才因此找到问题所在。
下面将判别城市消费水平和乳腺癌诊断的问题贴上。
问题一:(懒得打字,上截图吧)
8.35 23.53 7.51 8.62 17.42 10.00 1.04 11.21 9.25 23.75 6.61 9.19 17.77 10.48 1.72 10.51 8.19 30.50 4.72 9.78 16.28 7.60 2.52 10.32 7.73 29.20 5.42 9.43 19.29 8.49 2.52 10.00 9.42 27.93 8.20 8.14 16.17 9.42 1.55 9.76 9.16 27.98 9.01 9.32 15.99 9.10 1.82 11.35 10.06 28.64 10.52 10.05 16.18 8.39 1.96 10.81 9.09 28.12 7.40 9.62 17.26 11.12 2.49 12.65 9.41 28.20 5.77 10.80 16.36 11.56 1.53 12.17 8.70 28.12 7.21 10.53 19.45 13.30 1.66 11.96 6.93 29.85 4.54 9.49 16.62 10.65 1.88 13.61 8.67 36.05 7.31 7.75 16.67 11.68 2.38 12.88 9.98 37.69 7.01 8.94 16.15 11.08 0.83 11.67 6.77 38.69 6.01 8.82 14.79 11.44 1.74 13.23 8.14 37.75 9.61 8.49 13.15 9.76 1.28 11.28 7.67 35.71 8.04 8.31 15.13 7.76 1.41 13.25 7.90 39.77 8.49 12.94 19.27 11.05 2.04 13.29 7.18 40.91 7.32 8.94 17.60 12.75 1.14 14.80 8.82 33.70 7.59 10.98 18.82 14.73 1.78 10.10 6.25 35.02 4.72 6.28 10.03 7.15 1.93 10.39 10.60 52.41 7.70 9.98 12.53 11.70 2.31 14.69 7.27 52.65 3.84 9.16 13.03 15.26 1.98 14.57 13.45 55.85 5.50 7.45 9.55 9.52 2.21 16.30 10.85 44.68 7.32 14.51 17.13 12.08 1.26 11.57 7.21 45.79 7.66 10.36 16.56 12.86 2.25 11.69 7.68 50.37 11.35 13.30 19.25 14.59 2.75 14.87 7.78 48.44 8.00 20.51 22.12 15.73 1.15 16.61 7.94 39.65 20.97 20.82 22.52 12.41 1.75 7.90 8.28 64.34 8.00 22.22 20.06 15.12 0.72 22.89 12.47 76.39 5.52 11.24 14.52 22.00 5.46 25.50
MATLAB代码:
%% %%支持向量机SVM. %求解步骤: % 1.将数据分组;2.对已分类数据标准化;3.对未分类数据标准化 % 4.对已知数据进行分组编号5.svmtrain函数训练向量机, % 6.求权值、常数项、支持标号,相反数、倒数 % 7.验证已知的分类8.计算错误率9.对未知数据进行分类 % %%注意事项: % 需要注意mapstd是对行向量进行标准化,svmclasssify是对一行为一个样本进行分类,注意两次使用时0矩阵转置。 a = load(\'fenlei.txt\');%就是上面的数据,无奈没法上传文件,可能是我没找着吧。哈哈哈 aa = a\'; a1 = aa(:,1:27);%第一类的转置数据 a3 = aa(:,28:30);%待分类转置数据 [stand,ps] = mapstd(a1);%mapstd是对行向量进行标准化 stand3 = mapstd(\'apply\',a3,ps); group = [[ones(1,20),2*ones(1,7)]]; s = svmtrain(stand\',group);%一行为一个训练样本 Snum = s.SupportVectorIndices; Salpha = s.Alpha%权值 Sbias = s.Bias;%常数项 v = s.ScaleData%第一行为已知样本点均值向量的相反数,第二行为标准差向量的倒数 check = svmclassify(s,stand\'); wrong = 1-sum(group==check\')/length(group); class = svmclassify(s,stand3\');
问题二:
842302,-1,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189 842517,-1,20.57,17.77,132.9,1326,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.0134,0.01389,0.003532,24.99,23.41,158.8,1956,0.1238,0.1866,0.2416,0.186,0.275,0.08902 84300903,-1,19.69,21.25,130,1203,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,0.7456,0.7869,4.585,94.03,0.00615,0.04006,0.03832,0.02058,0.0225,0.004571,23.57,25.53,152.5,1709,0.1444,0.4245,0.4504,0.243,0.3613,0.08758 84348301,-1,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,1.156,3.445,27.23,0.00911,0.07458,0.05661,0.01867,0.05963,0.009208,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173 84358402,-1,20.29,14.34,135.1,1297,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.01885,0.01756,0.005115,22.54,16.67,152.2,1575,0.1374,0.205,0.4,0.1625,0.2364,0.07678 843786,-1,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,0.07613,0.3345,0.8902,2.217,27.19,0.00751,0.03345,0.03672,0.01137,0.02165,0.005082,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244 844359,-1,18.25,19.98,119.6,1040,0.09463,0.109,0.1127,0.074,0.1794,0.05742,0.4467,0.7732,3.18,53.91,0.004314,0.01382,0.02254,0.01039,0.01369,0.002179,22.88,27.66,153.2,1606,0.1442,0.2576,0.3784,0.1932,0.3063,0.08368 84458202,-1,13.71,20.83,90.2,577.9,0.1189,0.1645,0.09366,0.05985,0.2196,0.07451,0.5835,1.377,3.856,50.96,0.008805,0.03029,0.02488,0.01448,0.01486,0.005412,17.06,28.14,110.6,897,0.1654,0.3682,0.2678,0.1556,0.3196,0.1151 844981,-1,13,21.82,87.5,519.8,0.1273,0.1932,0.1859,0.09353,0.235,0.07389,0.3063,1.002,2.406,24.32,0.005731,0.03502,0.03553,0.01226,0.02143,0.003749,15.49,30.73,106.2,739.3,0.1703,0.5401,0.539,0.206,0.4378,0.1072 84501001,-1,12.46,24.04,83.97,475.9,0.1186,0.2396,0.2273,0.08543,0.203,0.08243,0.2976,1.599,2.039,23.94,0.007149,0.07217,0.07743,0.01432,0.01789,0.01008,15.09,40.68,97.65,711.4,0.1853,1.058,1.105,0.221,0.4366,0.2075 845636,-1,16.02,23.24,102.7,797.8,0.08206,0.06669,0.03299,0.03323,0.1528,0.05697,0.3795,1.187,2.466,40.51,0.004029,0.009269,0.01101,0.007591,0.0146,0.003042,19.19,33.88,123.8,1150,0.1181,0.1551,0.1459,0.09975,0.2948,0.08452 84610002,-1,15.78,17.89,103.6,781,0.0971,0.1292,0.09954,0.06606,0.1842,0.06082,0.5058,0.9849,3.564,54.16,0.005771,0.04061,0.02791,0.01282,0.02008,0.004144,20.42,27.28,136.5,1299,0.1396,0.5609,0.3965,0.181,0.3792,0.1048 846226,-1,19.17,24.8,132.4,1123,0.0974,0.2458,0.2065,0.1118,0.2397,0.078,0.9555,3.568,11.07,116.2,0.003139,0.08297,0.0889,0.0409,0.04484,0.01284,20.96,29.94,151.7,1332,0.1037,0.3903,0.3639,0.1767,0.3176,0.1023 846381,-1,15.85,23.95,103.7,782.7,0.08401,0.1002,0.09938,0.05364,0.1847,0.05338,0.4033,1.078,2.903,36.58,0.009769,0.03126,0.05051,0.01992,0.02981,0.003002,16.84,27.66,112,876.5,0.1131,0.1924,0.2322,0.1119,0.2809,0.06287 84667401,-1,13.73,22.61,93.6,578.3,0.1131,0.2293,0.2128,0.08025,0.2069,0.07682,0.2121,1.169,2.061,19.21,0.006429,0.05936,0.05501,0.01628,0.01961,0.008093,15.03,32.01,108.8,697.7,0.1651,0.7725,0.6943,0.2208,0.3596,0.1431 84799002,-1,14.54,27.54,96.73,658.8,0.1139,0.1595,0.1639,0.07364,0.2303,0.07077,0.37,1.033,2.879,32.55,0.005607,0.0424,0.04741,0.0109,0.01857,0.005466,17.46,37.13,124.1,943.2,0.1678,0.6577,0.7026,0.1712,0.4218,0.1341 848406,-1,14.68,20.13,94.74,684.5,0.09867,0.072,0.07395,0.05259,0.1586,0.05922,0.4727,1.24,3.195,45.4,0.005718,0.01162,0.01998,0.01109,0.0141,0.002085,19.07,30.88,123.4,1138,0.1464,0.1871,0.2914,0.1609,0.3029,0.08216 84862001,-1,16.13,20.68,108.1,798.8,0.117,0.2022,0.1722,0.1028,0.2164,0.07356,0.5692,1.073,3.854,54.18,0.007026,0.02501,0.03188,0.01297,0.01689,0.004142,20.96,31.48,136.8,1315,0.1789,0.4233,0.4784,0.2073,0.3706,0.1142 849014,-1,19.81,22.15,130,1260,0.09831,0.1027,0.1479,0.09498,0.1582,0.05395,0.7582,1.017,5.865,112.4,0.006494,0.01893,0.03391,0.01521,0.01356,0.001997,27.32,30.88,186.8,2398,0.1512,0.315,0.5372,0.2388,0.2768,0.07615 8510426,1,13.54,14.36,87.46,566.3,0.09779,0.08129,0.06664,0.04781,0.1885,0.05766,0.2699,0.7886,2.058,23.56,0.008462,0.0146,0.02387,0.01315,0.0198,0.0023,15.11,19.26,99.7,711.2,0.144,0.1773,0.239,0.1288,0.2977,0.07259 8510653,1,13.08,15.71,85.63,520,0.1075,0.127,0.04568,0.0311,0.1967,0.06811,0.1852,0.7477,1.383,14.67,0.004097,0.01898,0.01698,0.00649,0.01678,0.002425,14.5,20.49,96.09,630.5,0.1312,0.2776,0.189,0.07283,0.3184,0.08183 8510824,1,9.504,12.44,60.34,273.9,0.1024,0.06492,0.02956,0.02076,0.1815,0.06905,0.2773,0.9768,1.909,15.7,0.009606,0.01432,0.01985,0.01421,0.02027,0.002968,10.23,15.66,65.13,314.9,0.1324,0.1148,0.08867,0.06227,0.245,0.07773 8511133,-1,15.34,14.26,102.5,704.4,0.1073,0.2135,0.2077,0.09756,0.2521,0.07032,0.4388,0.7096,3.384,44.91,0.006789,0.05328,0.06446,0.02252,0.03672,0.004394,18.07,19.08,125.1,980.9,0.139,0.5954,0.6305,0.2393,0.4667,0.09946 851509,-1,21.16,23.04,137.2,1404,0.09428,0.1022,0.1097,0.08632,0.1769,0.05278,0.6917,1.127,4.303,93.99,0.004728,0.01259,0.01715,0.01038,0.01083,0.001987,29.17,35.59,188,2615,0.1401,0.26,0.3155,0.2009,0.2822,0.07526 852552,-1,16.65,21.38,110,904.6,0.1121,0.1457,0.1525,0.0917,0.1995,0.0633,0.8068,0.9017,5.455,102.6,0.006048,0.01882,0.02741,0.0113,0.01468,0.002801,26.46,31.56,177,2215,
全部评论
请发表评论