n gram - How to implement a spectrum kernel function in MATLAB?

Question

Welcome To Ask or Share your Answers For Others

n gram - How to implement a spectrum kernel function in MATLAB?

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:04:53+0000

The first step would be to create a function that can generate an n-gram for a given string. One way to do this in a vectorized fashion is with some clever indexing.

function [subStrings, counts] = n_gram(fullString, N)
  if (N == 1)
    [subStrings, ~, index] = unique(cellstr(fullString.'));  %.'# Simple case
  else
    nString = numel(fullString);
    index = hankel(1:(nString-N+1), (nString-N+1):nString);
    [subStrings, ~, index] = unique(cellstr(fullString(index)));
  end
  counts = accumarray(index, 1);
end

This uses the function HANKEL to first create a matrix of indices that will select each set of unique N-length substrings from the given string. Indexing the given string with this index matrix will create a character array with one N-length substring per row. The function CELLSTR then places each row of the character array into a cell of a cell array. The function UNIQUE then removes repeated substrings, and the function ACCUMARRAY is used to count the occurrences of each unique substring (if they are needed for any reason).

With the above function you can then easily count the number of n-grams shared between two strings using the INTERSECT function:

subStrings1 = n_gram('tool',2);
subStrings2 = n_gram('fool',2);
sharedStrings = intersect(subStrings1,subStrings2);
nShared = numel(sharedStrings);

Categories

n gram - How to implement a spectrum kernel function in MATLAB?

n gram - How to implement a spectrum kernel function in MATLAB?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags