Comment utiliser le parseur de Stanford dans NLTK en utilisant Python ?

Question

Comment utiliser le parseur de Stanford dans NLTK en utilisant Python ?

Demandé el 14 de Décembre, 2012: Quand la question a-t-elle été
106477 affichage: Nombre de visites la question a
3 Réponses: Nombre de réponses aux questions
Résolu: Situation réelle de la question

Est-il possible d'utiliser Stanford Parser dans NLTK ? (Je ne parle pas de Stanford POS).

Demandé el 14 de Décembre, 2012 par ThanaDaray

Answer 1

3 Réponses

Answer 2

2voto

Aditi Points 680

J'utilise nltk version 3.2.4. Et le code suivant a fonctionné pour moi.

from nltk.internals import find_jars_within_path
from nltk.tag import StanfordPOSTagger
from nltk import word_tokenize

# Alternatively to setting the CLASSPATH add the jar and model via their 
path:
jar = '/home/ubuntu/stanford-postagger-full-2017-06-09/stanford-postagger.jar'
model = '/home/ubuntu/stanford-postagger-full-2017-06-09/models/english-left3words-distsim.tagger'

pos_tagger = StanfordPOSTagger(model, jar)

# Add other jars from Stanford directory
stanford_dir = pos_tagger._stanford_jar.rpartition('/')[0]
stanford_jars = find_jars_within_path(stanford_dir)
pos_tagger._stanford_jar = ':'.join(stanford_jars)

text = pos_tagger.tag(word_tokenize("Open app and play movie"))
print(text)

Sortie :

[('Open', 'VB'), ('app', 'NN'), ('and', 'CC'), ('play', 'VB'), ('movie', 'NN')]

Répondu el 14 de Septembre, 2017 par Aditi (680 Points )

Answer 3

2voto

alvas Points 4333

Réponse obsolète

La réponse ci-dessous est obsolète, veuillez utiliser la solution sur https://stackoverflow.com/a/51981566/610569 pour NLTK v3.3 et plus.

EDITED

Remarque : la réponse suivante ne fonctionne que sur :

Version de NLTK ==3.2.5
Outils de Stanford compilés depuis le 2016-10-31
Python 2.7, 3.5 et 3.6

Comme les deux outils évoluent assez rapidement, l'API peut être très différente 3 à 6 mois plus tard. Veuillez considérer la réponse suivante comme temporaire et non comme une solution éternelle.

Toujours se référer à https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software pour obtenir les dernières instructions sur la manière d'interfacer les outils NLP de Stanford en utilisant NLTK !

TL;DR

Le code suivant provient de https://github.com/nltk/nltk/pull/1735#issuecomment-306091826

En terminal :

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

En Python :

>>> from nltk.tag.stanford import CoreNLPPOSTagger, CoreNLPNERTagger
>>> from nltk.parse.corenlp import CoreNLPParser

>>> stpos, stner = CoreNLPPOSTagger(), CoreNLPNERTagger()

>>> stpos.tag('What is the airspeed of an unladen swallow ?'.split())
[(u'What', u'WP'), (u'is', u'VBZ'), (u'the', u'DT'), (u'airspeed', u'NN'), (u'of', u'IN'), (u'an', u'DT'), (u'unladen', u'JJ'), (u'swallow', u'VB'), (u'?', u'.')]

>>> stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]

>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> next(
...     parser.raw_parse('The quick brown fox jumps over the lazy dog.')
... ).pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )

>>> parse_fox.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .

>>> parse_wolf.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|_________      |    |     _______|____    |
 DT   JJ   JJ   NN   VBZ   IN   DT      JJ   NN  .
 |    |    |    |     |    |    |       |    |   |
The quick grey wolf jumps over the     lazy fox  .

>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )

>>> parse_dog.pretty_print()  # doctest: +NORMALIZE_WHITESPACE
        ROOT
         |
         S
  _______|____
 |            VP
 |    ________|___
 NP  |            NP
 |   |         ___|___
PRP VBP       DT      NN
 |   |        |       |
 I   'm       a      dog

Veuillez jeter un coup d'œil à http://www.nltk.org/_modules/nltk/parse/corenlp.html pour plus d'informations sur l'API de Stanford. Jetez un coup d'œil à la documentation !

Répondu el 18 de Mars, 2018 par alvas (4333 Points )

Answer 4

1voto

Pradip Pramanick Points 131

Un nouveau développement de l'analyseur syntaxique de Stanford basé sur un modèle neuronal, entraîné à l'aide de Tensorflow, a été récemment mis à disposition pour être utilisé en tant qu'API python. Ce modèle est censé être beaucoup plus précis que le modèle basé sur Java. Vous pouvez certainement l'intégrer à un pipeline NLTK.

Lien à l'analyseur syntaxique. Le référentiel contient des modèles d'analyseurs pré-entraînés pour 53 langues.

Répondu el 3 de Septembre, 2019 par Pradip Pramanick (131 Points )

Comment utiliser le parseur de Stanford dans NLTK en utilisant Python ?

Réponses

Réponse obsolète

EDITED

TL;DR

Questions en vedette

Top Tags

Prograide.com

Powered by:

Comment utiliser le parseur de Stanford dans NLTK en utilisant Python ?

Réponses

Réponse obsolète

EDITED

TL;DR

Questions en vedette

Top Tags

Dans notre réseau

Prograide.com

Powered by: