本文整理汇总了Java中de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter类的典型用法代码示例。如果您正苦于以下问题:Java StanfordSegmenter类的具体用法?Java StanfordSegmenter怎么用?Java StanfordSegmenter使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
StanfordSegmenter类属于de.tudarmstadt.ukp.dkpro.core.stanfordnlp包,在下文中一共展示了StanfordSegmenter类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: getPipeline
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter; //导入依赖的package包/类
/**
* Creates a tokenizing pipeline
*
* @throws IOException exception
*/
private static AnalysisEngineDescription getPipeline()
throws IOException
{
if (pipelineSingleton == null) {
try {
pipelineSingleton = AnalysisEngineFactory.createEngineDescription(
AnalysisEngineFactory.createEngineDescription(ParagraphSplitter.class,
ParagraphSplitter.PARAM_SPLIT_PATTERN,
ParagraphSplitter.SINGLE_LINE_BREAKS_PATTERN),
AnalysisEngineFactory.createEngineDescription(ArkTweetTokenizerFixed.class),
AnalysisEngineFactory.createEngineDescription(StanfordSegmenter.class,
StanfordSegmenter.PARAM_WRITE_TOKEN, false,
StanfordSegmenter.PARAM_ZONE_TYPES,
Paragraph.class.getCanonicalName()));
}
catch (ResourceInitializationException e) {
throw new IOException();
}
}
return pipelineSingleton;
}
开发者ID:UKPLab,项目名称:argument-reasoning-comprehension-task,代码行数:28,代码来源:Step0bTextSegmenterA.java
示例2: main
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter; //导入依赖的package包/类
public static void main(String[] args) throws UIMAException, IOException {
// read text documents
CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(TextReader.class,
TextReader.PARAM_SOURCE_LOCATION, textFolder, TextReader.PARAM_PATTERNS, textPattern,
TextReader.PARAM_LANGUAGE, "en");
// preprocess documents
String[] quoteBegin = { "“", "‘" };
List<String> quoteBeginList = Arrays.asList(quoteBegin);
String[] quoteEnd = { "”", "’" };
List<String> quoteEndList = Arrays.asList(quoteEnd);
AnalysisEngineDescription segmenter = AnalysisEngineFactory.createEngineDescription(StanfordSegmenter.class);
AnalysisEngineDescription pos = AnalysisEngineFactory.createEngineDescription(StanfordPosTagger.class,
StanfordPosTagger.PARAM_QUOTE_BEGIN, quoteBeginList, StanfordPosTagger.PARAM_QUOTE_END, quoteEndList);
AnalysisEngineDescription lemmatizer = AnalysisEngineFactory.createEngineDescription(StanfordLemmatizer.class);
AnalysisEngineDescription stemmer = AnalysisEngineFactory.createEngineDescription(SnowballStemmer.class,
SnowballStemmer.PARAM_LOWER_CASE, true);
AnalysisEngineDescription parser = AnalysisEngineFactory.createEngineDescription(StanfordParser.class,
StanfordParser.PARAM_MODEL_LOCATION, "lib/englishRNN.ser", StanfordParser.PARAM_MODE,
DependenciesMode.CC_PROPAGATED, StanfordPosTagger.PARAM_QUOTE_BEGIN, quoteBeginList,
StanfordPosTagger.PARAM_QUOTE_END, quoteEndList);
// write annotated data to file
AnalysisEngineDescription writer = AnalysisEngineFactory.createEngineDescription(BinaryCasWriter.class,
BinaryCasWriter.PARAM_TARGET_LOCATION, textFolder, BinaryCasWriter.PARAM_STRIP_EXTENSION, false,
BinaryCasWriter.PARAM_FILENAME_EXTENSION, ".bin6", BinaryCasWriter.PARAM_OVERWRITE, true);
// print statistics
AnalysisEngineDescription stat = AnalysisEngineFactory.createEngineDescription(CorpusStatWriter.class);
// run pipeline
SimplePipeline.runPipeline(reader, segmenter, pos, lemmatizer, stemmer, parser, writer, stat);
}
开发者ID:UKPLab,项目名称:emnlp2017-cmapsum-corpus,代码行数:36,代码来源:PipelinePreprocessing.java
示例3: main
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter; //导入依赖的package包/类
public static void main(String[] args) throws UIMAException, IOException {
Logger.getRootLogger().setLevel(Level.INFO);
// 0) parameter
if (args.length > 0)
textFolder = args[0];
// 1) read text documents
CollectionReaderDescription reader = CollectionReaderFactory.createReaderDescription(TextReader.class,
TextReader.PARAM_SOURCE_LOCATION, textFolder, TextReader.PARAM_PATTERNS, textPattern,
TextReader.PARAM_LANGUAGE, "en");
// 2) process documents
String[] quoteBegin = { "“", "‘" };
List<String> quoteBeginList = Arrays.asList(quoteBegin);
String[] quoteEnd = { "”", "’" };
List<String> quoteEndList = Arrays.asList(quoteEnd);
// tokenization and sentence splitting
AnalysisEngineDescription segmenter = AnalysisEngineFactory.createEngineDescription(StanfordSegmenter.class,
StanfordSegmenter.PARAM_NEWLINE_IS_SENTENCE_BREAK, "ALWAYS");
// part-of-speech tagging
AnalysisEngineDescription pos = AnalysisEngineFactory.createEngineDescription(StanfordPosTagger.class,
StanfordPosTagger.PARAM_QUOTE_BEGIN, quoteBeginList, StanfordPosTagger.PARAM_QUOTE_END, quoteEndList);
// lemmatizing
AnalysisEngineDescription lemmatizer = AnalysisEngineFactory.createEngineDescription(StanfordLemmatizer.class,
StanfordLemmatizer.PARAM_QUOTE_BEGIN, quoteBeginList, StanfordLemmatizer.PARAM_QUOTE_END, quoteEndList);
// named entity recognition
AnalysisEngineDescription ner = AnalysisEngineFactory.createEngineDescription(
StanfordNamedEntityRecognizer.class, StanfordNamedEntityRecognizer.PARAM_QUOTE_BEGIN, quoteBeginList,
StanfordNamedEntityRecognizer.PARAM_QUOTE_END, quoteEndList);
// constituency parsing and dependency conversion
AnalysisEngineDescription parser = AnalysisEngineFactory.createEngineDescription(StanfordParser.class,
StanfordParser.PARAM_QUOTE_BEGIN, quoteBeginList, StanfordParser.PARAM_QUOTE_END, quoteEndList,
StanfordParser.PARAM_MODE, DependenciesMode.CC_PROPAGATED);
// coreference resolution
AnalysisEngineDescription coref = AnalysisEngineFactory.createEngineDescription();
// 3) write annotated data to file
AnalysisEngineDescription writer = AnalysisEngineFactory.createEngineDescription(BinaryCasWriter.class,
BinaryCasWriter.PARAM_TARGET_LOCATION, textFolder, BinaryCasWriter.PARAM_STRIP_EXTENSION, false,
BinaryCasWriter.PARAM_FILENAME_EXTENSION, ".bin6", BinaryCasWriter.PARAM_OVERWRITE, true);
// print statistics
AnalysisEngineDescription stat = AnalysisEngineFactory.createEngineDescription(CorpusStatWriter.class);
// 4) run pipeline
SimplePipeline.runPipeline(reader, segmenter, pos, lemmatizer, ner, parser, coref, writer, stat);
}
开发者ID:UKPLab,项目名称:ijcnlp2017-cmaps,代码行数:57,代码来源:PipelinePreprocessing.java
示例4: process
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter; //导入依赖的package包/类
private static void process(String inputDir, String xmiOutputDir, String csvOutputDir, String parseDir)
throws UIMAException, IOException {
CollectionReader reader = createReader(TextReader.class, TextReader.PARAM_SOURCE_LOCATION, inputDir,
TextReader.PARAM_LANGUAGE, "en", TextReader.PARAM_PATTERNS, new String[] {"*.txt"}); // for WSJ subfolders: { "[+]*/*" }); // suffix .txt?
// Preprocessing with Stanford CoreNLP components
AnalysisEngineDescription stTokenizer = AnalysisEngineFactory.createEngineDescription(StanfordSegmenter.class,
StanfordSegmenter.PARAM_LANGUAGE, "en");
AnalysisEngineDescription stParser = AnalysisEngineFactory.createEngineDescription(StanfordParser.class,
StanfordParser.PARAM_LANGUAGE, "en", StanfordParser.PARAM_WRITE_POS, true,
StanfordParser.PARAM_WRITE_PENN_TREE, true, StanfordParser.PARAM_MAX_TOKENS, 200,
StanfordParser.PARAM_WRITE_CONSTITUENT, true, StanfordParser.PARAM_WRITE_DEPENDENCY, true,
StanfordParser.PARAM_MODE, StanfordParser.DependenciesMode.CC_PROPAGATED);
AnalysisEngineDescription stLemmas = AnalysisEngineFactory.createEngineDescription(StanfordLemmatizer.class);
// NP feature extraction components: select the noun phrases for which
// to extract features.
// See NounPhraseSelectorAnnotator for possible argument choices.
AnalysisEngineDescription npSelector = AnalysisEngineFactory.createEngineDescription(
NounPhraseSelectorAnnotator.class, NounPhraseSelectorAnnotator.PARAM_TARGET, "AllNounPhrases");
// Extract the NP-based features.
AnalysisEngineDescription npFeatures = AnalysisEngineFactory.createEngineDescription(
NounPhraseFeaturesAnnotator.class, NounPhraseFeaturesAnnotator.PARAM_COUNTABILITY_PATH,
countabilityPath, NounPhraseFeaturesAnnotator.PARAM_WORDNET_PATH, wordNetPath);
// Select the verbs for which to extract features.
AnalysisEngineDescription verbSelector = AnalysisEngineFactory
.createEngineDescription(VerbSelectorAnnotator.class);
// Extract the verb-based features.
AnalysisEngineDescription verbFeatures = AnalysisEngineFactory.createEngineDescription(
VerbFeaturesAnnotator.class, VerbFeaturesAnnotator.PARAM_WORDNET_PATH, wordNetPath,
VerbFeaturesAnnotator.PARAM_TENSE_FILE, "resources/tense/tense.txt");
// Write standoff CSV file with features.
AnalysisEngineDescription csvWriter = null;
if (csvOutputDir != null) {
csvWriter = AnalysisEngineFactory.createEngineDescription(SyntSemFeaturesCSVWriter.class,
SyntSemFeaturesCSVWriter.PARAM_OUTPUT_FOLDER, csvOutputDir);
}
// write out dependency parses (for development)
AnalysisEngineDescription parseWriter = null;
if (parseDir != null) {
parseWriter = AnalysisEngineFactory.createEngineDescription(ParseWriterAnnotator.class,
ParseWriterAnnotator.PARAM_OUTPUT_FILE, parseDir);
}
// writes out XMIs (can then be inspected with UIMA annotation viewer,
// or used for further processing in an UIMA pipeline)
AnalysisEngineDescription xmiWriter = null;
if (xmiOutputDir != null) {
xmiWriter = AnalysisEngineFactory.createEngineDescription(XmiWriter.class, XmiWriter.PARAM_TARGET_LOCATION,
xmiOutputDir);
}
if (xmiOutputDir != null && csvOutputDir != null) {
runPipeline(reader, stTokenizer, stParser, stLemmas, npSelector, npFeatures, verbSelector, verbFeatures,
csvWriter, xmiWriter);
}
if (xmiOutputDir != null && csvOutputDir == null) {
runPipeline(reader, stTokenizer, stParser, stLemmas, npSelector, npFeatures, verbSelector, verbFeatures,
xmiWriter);
}
if (xmiOutputDir == null && csvOutputDir != null) {
// TODO: proper configuration of pipeline for parseWriter
runPipeline(reader, stTokenizer, stParser, stLemmas, npSelector, npFeatures, verbSelector, verbFeatures,
csvWriter, parseWriter);
}
}
开发者ID:annefried,项目名称:sitent,代码行数:77,代码来源:FeatureExtractionPipeline.java
示例5: process
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter; //导入依赖的package包/类
private static void process(String inputDir, String xmiOutputDir, String csvOutputDir)
throws UIMAException, IOException {
CollectionReader reader = createReader(TextReader.class, TextReader.PARAM_SOURCE_LOCATION, inputDir,
TextReader.PARAM_LANGUAGE, "en", TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt" });
// Preprocessing with Stanford CoreNLP components
AnalysisEngineDescription stTokenizer = AnalysisEngineFactory.createEngineDescription(StanfordSegmenter.class,
StanfordSegmenter.PARAM_LANGUAGE, "en");
AnalysisEngineDescription stParser = AnalysisEngineFactory.createEngineDescription(StanfordParser.class,
StanfordParser.PARAM_LANGUAGE, "en", StanfordParser.PARAM_WRITE_POS, true,
StanfordParser.PARAM_WRITE_PENN_TREE, true, StanfordParser.PARAM_MAX_TOKENS, 200,
StanfordParser.PARAM_WRITE_CONSTITUENT, true, StanfordParser.PARAM_WRITE_DEPENDENCY, true,
StanfordParser.PARAM_MODE, StanfordParser.DependenciesMode.CC_PROPAGATED);
AnalysisEngineDescription stLemmas = AnalysisEngineFactory.createEngineDescription(StanfordLemmatizer.class);
// NP feature extraction components: select the noun phrases for which
// to extract features.
// See NounPhraseSelectorAnnotator for possible argument choices.
AnalysisEngineDescription npSelector = AnalysisEngineFactory.createEngineDescription(
NounPhraseSelectorAnnotator.class, NounPhraseSelectorAnnotator.PARAM_TARGET, "AllNounPhrases");
// Extract the NP-based features.
AnalysisEngineDescription npFeatures = AnalysisEngineFactory.createEngineDescription(
NounPhraseFeaturesAnnotator.class, NounPhraseFeaturesAnnotator.PARAM_COUNTABILITY_PATH,
countabilityPath, NounPhraseFeaturesAnnotator.PARAM_WORDNET_PATH, wordNetPath);
// Select the verbs for which to extract features.
AnalysisEngineDescription verbSelector = AnalysisEngineFactory
.createEngineDescription(VerbSelectorAnnotator.class);
// Extract the verb-based features.
AnalysisEngineDescription verbFeatures = AnalysisEngineFactory.createEngineDescription(
VerbFeaturesAnnotator.class, VerbFeaturesAnnotator.PARAM_WORDNET_PATH, wordNetPath,
VerbFeaturesAnnotator.PARAM_TENSE_FILE, "resources/tense/tense.txt");
// Write standoff CSV file with features.
AnalysisEngineDescription csvWriter = null;
if (csvOutputDir != null) {
csvWriter = AnalysisEngineFactory.createEngineDescription(SyntSemFeaturesCSVWriter.class,
SyntSemFeaturesCSVWriter.PARAM_OUTPUT_FOLDER, csvOutputDir);
}
// writes out XMIs (can then be inspected with UIMA annotation viewer,
// or used for further processing in an UIMA pipeline)
AnalysisEngineDescription xmiWriter = null;
if (xmiOutputDir != null) {
xmiWriter = AnalysisEngineFactory.createEngineDescription(XmiWriter.class, XmiWriter.PARAM_TARGET_LOCATION,
xmiOutputDir);
}
if (xmiOutputDir != null && csvOutputDir != null) {
runPipeline(reader, stTokenizer, stParser, stLemmas, npSelector, npFeatures, verbSelector, verbFeatures,
csvWriter, xmiWriter);
}
if (xmiOutputDir != null && csvOutputDir == null) {
runPipeline(reader, stTokenizer, stParser, stLemmas, npSelector, npFeatures, verbSelector, verbFeatures,
xmiWriter);
}
if (xmiOutputDir == null && csvOutputDir != null) {
runPipeline(reader, stTokenizer, stParser, stLemmas, npSelector, npFeatures, verbSelector, verbFeatures,
csvWriter);
}
}
开发者ID:annefried,项目名称:syntSemFeatures,代码行数:68,代码来源:FeatureExtractionPipeline.java
示例6: main
import de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter; //导入依赖的package包/类
public static void main(String[] args)
throws UIMAException, IOException
{
CollectionReader stanfordReade = createReader(StanfordReader.class,
StanfordReader.PARAM_DIRECTORY_NAME, "C://Users//skohail//Desktop//PhD//complete Data//reviewsfile");
AnalysisEngine stanfordannotator = createEngine(StanfordSegmenter.class, StanfordSegmenter.PARAM_CREATE_SENTENCES,false);
AnalysisEngine stanfordWriter = createEngine(StanfordOutWriter.class);
SimplePipeline.runPipeline(stanfordReade, stanfordannotator, stanfordWriter);
}
开发者ID:tudarmstadt-lt,项目名称:sentiment,代码行数:14,代码来源:StanfordePipline.java
注:本文中的de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordSegmenter类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论