感受数据。训练朴素贝叶斯和 kNN

Created: November-22, 2018

为了构建一个好的分类器，我们经常需要了解如何在特征空间中构建数据。Weka 提供可以提供帮助的可视化模块。

StackOverflow 文档

一些维度已经很好地分离了这些类。例如，与花瓣宽度相比，花瓣宽度非常整齐地命令概念。

训练简单的分类器也可以揭示数据的结构。我通常喜欢使用最近邻和朴素贝叶斯。朴素贝叶斯假定独立，表现良好表明维度本身保存信息。k-Nearest-Neighbor 通过在特征空间中指定 k 个最近（已知）实例的类来工作。它通常用于检查本地地理依赖性，我们将用它来检查我们的概念是否在特征空间中本地定义。

 //Now we build a Naive Bayes classifier
 NaiveBayes classifier2 = new NaiveBayes();
 classifier2.buildClassifier(trainset);
 // Next we test it against the testset
 Test = new Evaluation(trainset);
 Test.evaluateModel(classifier2, testset);
 System.out.println(Test.toSummaryString());
 
//Now we build a kNN classifier
IBk classifier3 = new IBk();
// We tell the classifier to use the first nearest neighbor as example 
classifier3.setOptions(weka.core.Utils.splitOptions("-K 1"));
classifier3.buildClassifier(trainset);
// Next we test it against the testset
Test = new Evaluation(trainset);
Test.evaluateModel(classifier3, testset);
System.out.println(Test.toSummaryString());

朴素贝叶斯的表现比我们新建立的基线好得多，表明独立的功能可以保存信息（记住花瓣宽度？）。

1NN 表现也很好（事实上在这种情况下好一点），表明我们的一些信息是本地的。更好的性能可能表明某些二阶效应也包含信息 （如果 x 和 y 比 z 类） 。