Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
300 views
in Technique[技术] by (71.8m points)

c# - 文本分类NaiveBayes Accord.NET(Text classification NaiveBayes Accord.NET)

I am particularly new to accord.net

(我对Accord.net特别陌生)

My case: Classifying short text of various length into 100+ different categories.

(我的情况是:将各种长度的短文本分为100多个不同类别。)

Input sample (10k Records in a .csv file)

(输入样本(.csv文件中的1万条记录))

Text ------------------------- Category

(文字-------------------------类别)

Cabinet -------------------- Furniture and Fittings

(内阁--------------------家具及配件)

Coffee Table -------------- Furniture and Fittings

(茶几--------------家具及配件)

Stainless steel table ----- Furniture and Fittings

(不锈钢桌-----家具及配件)

private static void bagOfWords(int[][] inputs, int[] outputs)
{
    var bow = new BagOfWords<int>();
    var quantizer = bow.Learn(inputs);
    string filenamebow = Path.Combine(Application.StartupPath, "News_BOW.accord");
    Serializer.Save(obj: bow, path: filenamebow);
    double[][] histograms = quantizer.Transform(inputs);

    // One way to perform sequence classification with an SVM is to use
    // a kernel defined over sequences, such as DynamicTimeWarping.

    // Create the multi-class learning algorithm as one-vs-one with DTW:
    var teacher = new MulticlassSupportVectorLearning<ChiSquare, double[]>()
    {
        Learner = (p) => new SequentialMinimalOptimization<ChiSquare, double[]>()
        {
           // Complexity = 100 // Create a hard SVM
        }
    };

    // Learn a multi-label SVM using the teacher
    var svm = teacher.Learn(histograms, outputs);

    // Get the predictions for the inputs
    int[] predicted = svm.Decide(histograms);

    // Create a confusion matrix to check the quality of the predictions:
    var cm = new GeneralConfusionMatrix(predicted: predicted, expected: outputs);

    // Check the accuracy measure:
    double accuracy = cm.Accuracy;

    string filename = Path.Combine(Application.StartupPath, "News_SVM.accord");
    Serializer.Save(obj: svm, path: filename);
}

private void Form1_Load(object sender, EventArgs e)
{
    ...........
    ..........
    ...........
    dTable = worksheet.ExportDataTable();
    /////////////////////////////////////////////////

    StringBuilder sWords = new StringBuilder();  //what is this for btw?
    string[][] swords = new string[dTable.Rows.Count][];
    int i = 0;

    foreach (DataRowView dr in dTable.DefaultView)
    {
        swords[i] = Tokenize(dr[0].ToString());
        i++;
    }

    Codification codebook = new Codification(dTable, new string[] { "Title", "Category" });
    DataTable symbols = codebook.Apply(dTable);
    int[][] inputs = symbols.ToJagged<int>(new string[] { "Title" });
    int[] outputs = symbols.ToArray<int>("Category");

    bagOfWords(inputs, outputs);
    DataTable input_dTable = worksheetInput.ExportDataTable();
    //How to continue from here and get the batch result as output DataTable
}

How do we pass in a DataTable as input and get the batch results as output as DataTable after training the model?

(在训练模型后,我们如何传递数据表作为输入,并获得批处理结果作为数据表作为输出?)

Similar github project: Text classification NaiveBayes

(相似的github项目: 文本分类NaiveBayes)

  ask by Paiseh99 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...