nltk DOES have an interface for Stanford NER, check nltk.tag.stanford.NERTagger
.
from nltk.tag.stanford import NERTagger
st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz',
'/usr/share/stanford-ner/stanford-ner.jar')
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
output:
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),
('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),
('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]
However every time you call tag
, nltk simply writes the target sentence into a file and runs Stanford NER command line tool to parse that file and finally parses the output back to python. Therefore the overhead of loading classifiers (around 1 min for me every time) is unavoidable.
If that's a problem, use Pyner.
First run Stanford NER as a server
java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer
-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 9191
then go to pyner
folder
import ner
tagger = ner.SocketNER(host='localhost', port=9191)
tagger.get_entities("University of California is located in California, United States")
# {'LOCATION': ['California', 'United States'],
# 'ORGANIZATION': ['University of California']}
tagger.json_entities("Alice went to the Museum of Natural History.")
#'{"ORGANIZATION": ["Museum of Natural History"], "PERSON": ["Alice"]}'
Hope this helps.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…