nlp - What is CoNLL data format?

Question

Welcome To Ask or Share your Answers For Others

nlp - What is CoNLL data format?

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:43:10+0000

There are many different CoNLL formats since CoNLL is a different shared task each year. The format for CoNLL 2009 is described here. Each line represents a single word with a series of tab-separated fields. _s indicate empty values. Mate-Parser's manual says that it uses the first 12 columns of CoNLL 2009:

ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL

The definition of some of these columns come from earlier shared tasks (the CoNLL-X format used in 2006 and 2007):

ID (index in sentence, starting at 1)
FORM (word form itself)
LEMMA (word's lemma or stem)
POS (part of speech)
FEAT (list of morphological features separated by |)
HEAD (index of syntactic parent, 0 for ROOT)
DEPREL (syntactic relationship between HEAD and this word)

There are variants of those columns (e.g., PPOS but not POS) that start with P indicate that the value was automatically predicted rather a gold standard value.

Update: There is now a CoNLL-U data format as well which extends the CoNLL-X format.

Categories

nlp - What is CoNLL data format?

nlp - What is CoNLL data format?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags