本文整理汇总了Java中opennlp.tools.cmdline.CmdLineUtil类的典型用法代码示例。如果您正苦于以下问题:Java CmdLineUtil类的具体用法?Java CmdLineUtil怎么用?Java CmdLineUtil使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
CmdLineUtil类属于opennlp.tools.cmdline包,在下文中一共展示了CmdLineUtil类的8个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: run
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public void run(String[] args) {
Params params = validateAndParseParams(args, Params.class);
File dictInFile = params.getInputFile();
CmdLineUtil.checkInputFile("dictionary input file", dictInFile);
Path metadataPath = DictionaryMetadata.getExpectedMetadataLocation(dictInFile.toPath());
CmdLineUtil.checkInputFile("dictionary metadata (.info) input file", metadataPath.toFile());
MorfologikDictionayBuilder builder = new MorfologikDictionayBuilder();
try {
builder.build(dictInFile.toPath(), params.getOverwrite(),
params.getValidate(), params.getAcceptBOM(), params.getAcceptCR(),
params.getIgnoreEmpty());
} catch (Exception e) {
throw new TerminateToolException(-1,
"Error while creating Morfologik POS Dictionay: " + e.getMessage(), e);
}
}
开发者ID:apache,项目名称:opennlp-addons,代码行数:21,代码来源:MorfologikDictionaryBuilderTool.java
示例2: main
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public static void main(String[] args) {
if (args.length < 2) {
System.out.println("usage: <input> <output>\n");
System.exit(0);
}
String input = args[0];
String output = args[1];
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(0));
params.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(100));
//params.put(TrainingParameters.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
AgeClassifyModel model;
try {
model = AgeClassifySparkTrainer.createModel("en", input,
"opennlp.tools.tokenize.SentenceTokenizer", "opennlp.tools.tokenize.BagOfWordsTokenizer", params);
} catch (IOException e) {
throw new TerminateToolException(-1,
"IO error while reading training data or indexing data: " + e.getMessage(), e);
}
CmdLineUtil.writeModel("age classifier", new File(output), model);
}
开发者ID:USCDataScience,项目名称:AgePredictor,代码行数:25,代码来源:AgeClassifySparkTrainer.java
示例3: serializeEntityGazetteers
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public static void serializeEntityGazetteers(Path dictionaryFile)
throws IOException {
Map<String, String> dictionary = new HashMap<String, String>();
InputStream inputStream = CmdLineUtil.openInFile(dictionaryFile.toFile());
BufferedReader breader = new BufferedReader(
new InputStreamReader(inputStream, Charset.forName("UTF-8")));
String line;
while ((line = breader.readLine()) != null) {
String[] lineArray = tabPattern.split(line);
if (lineArray.length == 2) {
String normalizedToken = dotInsideI.matcher(lineArray[0])
.replaceAll("i");
dictionary.put(normalizedToken.toLowerCase(), lineArray[1].intern());
} else {
System.err.println(lineArray[0] + " is not well formed!");
}
}
String outputFile = dictionaryFile.toString() + SER_GZ;
IOUtils.writeClusterToFile(dictionary, outputFile, IOUtils.TAB_DELIMITER);
breader.close();
}
开发者ID:ragerri,项目名称:ixa-pipe-convert,代码行数:22,代码来源:SerializeResources.java
示例4: serializeLemmaDictionary
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public static void serializeLemmaDictionary(Path lemmaDict)
throws IOException {
Map<List<String>, String> dictMap = new HashMap<List<String>, String>();
InputStream inputStream = CmdLineUtil.openInFile(lemmaDict.toFile());
BufferedReader breader = new BufferedReader(
new InputStreamReader(inputStream, Charset.forName("UTF-8")));
String line;
while ((line = breader.readLine()) != null) {
final String[] elems = tabPattern.split(line);
if (elems.length == 3) {
String normalizedToken = dotInsideI.matcher(elems[0]).replaceAll("I");
dictMap.put(Arrays.asList(normalizedToken, elems[2]), elems[1]);
} else {
System.err.println(elems[0] + " is not well formed!");
}
}
String outputFile = lemmaDict.toString() + SER_GZ;
IOUtils.writeDictionaryLemmatizerToFile(dictMap, outputFile,
IOUtils.TAB_DELIMITER);
breader.close();
}
开发者ID:ragerri,项目名称:ixa-pipe-convert,代码行数:22,代码来源:SerializeResources.java
示例5: train
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
/**
* Main entry point for training.
*
* @throws IOException
* throws an exception if errors in the various file inputs.
*/
public final void train() throws IOException {
// load training parameters file
final String paramFile = this.parsedArguments.getString("params");
final TrainingParameters params = InputOutputUtils
.loadTrainingParameters(paramFile);
String outModel = null;
if (params.getSettings().get("OutputModel") == null
|| params.getSettings().get("OutputModel").length() == 0) {
outModel = Files.getNameWithoutExtension(paramFile) + ".bin";
params.put("OutputModel", outModel);
} else {
outModel = Flags.getModel(params);
}
final Trainer chunkerTrainer = new DefaultTrainer(params);
final ChunkerModel trainedModel = chunkerTrainer.train(params);
CmdLineUtil.writeModel("ixa-pipe-chunk", new File(outModel), trainedModel);
}
开发者ID:ixa-ehu,项目名称:ixa-pipe-chunk,代码行数:24,代码来源:CLI.java
示例6: openSampleData
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
static ObjectStream<POSSample> openSampleData(String sampleDataName, File sampleDataFile, Charset encoding) {
CmdLineUtil.checkInputFile(sampleDataName + " Data", sampleDataFile);
FileInputStream sampleDataIn = CmdLineUtil.openInFile(sampleDataFile);
ObjectStream<String> lineStream = new PlainTextByLineStream(sampleDataIn.getChannel(), encoding);
return new WordTagSampleStream(lineStream);
}
开发者ID:radsimu,项目名称:UaicNlpToolkit,代码行数:7,代码来源:POStrainer.java
示例7: train
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public void train() throws IOException {
if (languageCode == null) {
throw new IllegalStateException("languageCode is not provided");
}
if (modelOutFile == null) {
throw new IllegalStateException("model output path is not provided");
}
if (trainParams == null) {
throw new IllegalStateException("training parameters are not set");
}
if (sentenceStream == null) {
throw new IllegalStateException("sentence stream is not configured");
}
if (taggerFactory == null) {
throw new IllegalStateException("tagger factory is not configured");
}
Map<String, String> manifestInfoEntries = new HashMap<>();
BeamSearchContextGenerator<Token> contextGenerator = taggerFactory.getContextGenerator();
MaxentModel posModel;
try {
if (TrainerFactory.TrainerType.EVENT_MODEL_TRAINER.equals(
TrainerFactory.getTrainerType(trainParams.getSettings()))) {
ObjectStream<Event> es = new POSTokenEventStream<>(sentenceStream, contextGenerator);
EventTrainer trainer = TrainerFactory.getEventTrainer(trainParams.getSettings(), manifestInfoEntries);
posModel = trainer.train(es);
} else {
throw new UnsupportedOperationException("Sequence training");
//POSSampleSequenceStream ss = new POSSampleSequenceStream(samples, contextGenerator);
// posModel = TrainUtil.train(ss, trainParams.getSettings(), manifestInfoEntries);
}
} finally {
sentenceStream.close();
}
POSModel modelAggregate = new POSModel(languageCode,
posModel, manifestInfoEntries, taggerFactory);
CmdLineUtil.writeModel("PoS-tagger", modelOutFile, modelAggregate);
}
开发者ID:textocat,项目名称:textokit-core,代码行数:40,代码来源:OpenNLPPosTaggerTrainer.java
示例8: brownCleanUpperCase
import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
/**
* Do not print a sentence if is less than 90% lowercase.
*
* @param sentences
* the list of sentences
* @throws IOException
*/
private static void brownCleanUpperCase(Path inFile) throws IOException {
StringBuilder precleantext = new StringBuilder();
InputStream inputStream = CmdLineUtil.openInFile(inFile.toFile());
BufferedReader breader = new BufferedReader(
new InputStreamReader(inputStream, Charset.forName("UTF-8")));
String line;
while ((line = breader.readLine()) != null) {
double lowercaseCounter = 0;
StringBuilder sb = new StringBuilder();
String[] lineArray = line.split(" ");
for (String word : lineArray) {
if (lineArray.length > 0) {
sb.append(word);
}
}
char[] lineCharArray = sb.toString().toCharArray();
for (char lineArr : lineCharArray) {
if (Character.isLowerCase(lineArr)) {
lowercaseCounter++;
}
}
double percent = lowercaseCounter / (double) lineCharArray.length;
if (percent >= 0.90) {
precleantext.append(line).append("\n");
}
}
Path outfile = Files.createFile(Paths.get(inFile.toString() + ".clean"));
Files.write(outfile,
precleantext.toString().getBytes(StandardCharsets.UTF_8));
System.err.println(">> Wrote clean document to " + outfile);
breader.close();
}
开发者ID:ragerri,项目名称:ixa-pipe-convert,代码行数:40,代码来源:Convert.java
注:本文中的opennlp.tools.cmdline.CmdLineUtil类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论