感受資料。訓練樸素貝葉斯和 kNN

Created: November-22, 2018

為了構建一個好的分類器，我們經常需要了解如何在特徵空間中構建資料。Weka 提供可以提供幫助的視覺化模組。

StackOverflow 文件

一些維度已經很好地分離了這些類。例如，與花瓣寬度相比，花瓣寬度非常整齊地命令概念。

訓練簡單的分類器也可以揭示資料的結構。我通常喜歡使用最近鄰和樸素貝葉斯。樸素貝葉斯假定獨立，表現良好表明維度本身儲存資訊。k-Nearest-Neighbor 通過在特徵空間中指定 k 個最近（已知）例項的類來工作。它通常用於檢查本地地理依賴性，我們將用它來檢查我們的概念是否在特徵空間中本地定義。

 //Now we build a Naive Bayes classifier
 NaiveBayes classifier2 = new NaiveBayes();
 classifier2.buildClassifier(trainset);
 // Next we test it against the testset
 Test = new Evaluation(trainset);
 Test.evaluateModel(classifier2, testset);
 System.out.println(Test.toSummaryString());
 
//Now we build a kNN classifier
IBk classifier3 = new IBk();
// We tell the classifier to use the first nearest neighbor as example 
classifier3.setOptions(weka.core.Utils.splitOptions("-K 1"));
classifier3.buildClassifier(trainset);
// Next we test it against the testset
Test = new Evaluation(trainset);
Test.evaluateModel(classifier3, testset);
System.out.println(Test.toSummaryString());

樸素貝葉斯的表現比我們新建立的基線好得多，表明獨立的功能可以儲存資訊（記住花瓣寬度？）。

1NN 表現也很好（事實上在這種情況下好一點），表明我們的一些資訊是本地的。更好的效能可能表明某些二階效應也包含資訊 （如果 x 和 y 比 z 類） 。