I am particularly new to accord.net
(我对Accord.net特别陌生)
My case: Classifying short text of various length into 100+ different categories.
(我的情况是:将各种长度的短文本分为100多个不同类别。)
Input sample (10k Records in a .csv file)
(输入样本(.csv文件中的1万条记录))
Text ------------------------- Category
(文字-------------------------类别)
Cabinet -------------------- Furniture and Fittings
(内阁--------------------家具及配件)
Coffee Table -------------- Furniture and Fittings
(茶几--------------家具及配件)
Stainless steel table ----- Furniture and Fittings
(不锈钢桌-----家具及配件)
private static void bagOfWords(int[][] inputs, int[] outputs)
{
var bow = new BagOfWords<int>();
var quantizer = bow.Learn(inputs);
string filenamebow = Path.Combine(Application.StartupPath, "News_BOW.accord");
Serializer.Save(obj: bow, path: filenamebow);
double[][] histograms = quantizer.Transform(inputs);
// One way to perform sequence classification with an SVM is to use
// a kernel defined over sequences, such as DynamicTimeWarping.
// Create the multi-class learning algorithm as one-vs-one with DTW:
var teacher = new MulticlassSupportVectorLearning<ChiSquare, double[]>()
{
Learner = (p) => new SequentialMinimalOptimization<ChiSquare, double[]>()
{
// Complexity = 100 // Create a hard SVM
}
};
// Learn a multi-label SVM using the teacher
var svm = teacher.Learn(histograms, outputs);
// Get the predictions for the inputs
int[] predicted = svm.Decide(histograms);
// Create a confusion matrix to check the quality of the predictions:
var cm = new GeneralConfusionMatrix(predicted: predicted, expected: outputs);
// Check the accuracy measure:
double accuracy = cm.Accuracy;
string filename = Path.Combine(Application.StartupPath, "News_SVM.accord");
Serializer.Save(obj: svm, path: filename);
}
private void Form1_Load(object sender, EventArgs e)
{
...........
..........
...........
dTable = worksheet.ExportDataTable();
/////////////////////////////////////////////////
StringBuilder sWords = new StringBuilder(); //what is this for btw?
string[][] swords = new string[dTable.Rows.Count][];
int i = 0;
foreach (DataRowView dr in dTable.DefaultView)
{
swords[i] = Tokenize(dr[0].ToString());
i++;
}
Codification codebook = new Codification(dTable, new string[] { "Title", "Category" });
DataTable symbols = codebook.Apply(dTable);
int[][] inputs = symbols.ToJagged<int>(new string[] { "Title" });
int[] outputs = symbols.ToArray<int>("Category");
bagOfWords(inputs, outputs);
DataTable input_dTable = worksheetInput.ExportDataTable();
//How to continue from here and get the batch result as output DataTable
}
How do we pass in a DataTable as input and get the batch results as output as DataTable after training the model?
(在训练模型后,我们如何传递数据表作为输入,并获得批处理结果作为数据表作为输出?)
Similar github project: Text classification NaiveBayes
(相似的github项目: 文本分类NaiveBayes)
ask by Paiseh99 translate from so 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…