Extraction de lxml xpath pour une table html

Question

Extraction de lxml xpath pour une table html

Demandé el 7 de Avril, 2011: Quand la question a-t-elle été
14949 affichage: Nombre de visites la question a
1 Réponses: Nombre de réponses aux questions
Résolu: Situation réelle de la question

J'ai un document html similaire au suivant :

<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
    <div id="Symbols" class="cb">
    <table class="quotes">
    <tr><th>Code</th><th>Name</th>
        <th style="text-align:right;">High</th>
        <th style="text-align:right;">Low</th>
    </tr>
    <tr class="ro" onclick="location.href="stackoverflow.com/xyz.com/A.htm';" style="color:red;">
        <td><a href="stackoverflow.com/xyz.com/A.htm" title="Display,A">A</a></td>
        <td>A Inc.</td>
        <td align="right">45.44</td>
        <td align="right">44.26</td>
    <tr class="re" onclick="location.href="stackoverflow.com/xyz.com/B.htm';" style="color:red;">
        <td><a href="stackoverflow.com/xyz.com/B.htm" title="Display,B">B</a></td>
        <td>B Inc.</td>
        <td align="right">18.29</td>
        <td align="right">17.92</td>
</div></html>

Je dois extraire code/name/high/low des informations du tableau.

J'ai utilisé le code suivant, tiré d'un des exemples similaires de Stack Over Flow :

#############################
import urllib2
from lxml import html, etree

webpg = urllib2.urlopen(http://www.eoddata.com/stocklist/NYSE/A.htm).read()
table = html.fromstring(webpg)

for row in table.xpath('//table[@class="quotes"]/tbody/tr'):
    for column in row.xpath('./th[position()>0]/text() | ./td[position()=1]/a/text() | ./td[position()>1]/text()'):
        print column.strip(),
    print

#############################

Je ne reçois rien en sortie. Je dois changer le xpath de la première boucle en table.xpath('//tr') de table.xpath('//table[@class="quotes"]/tbody/tr')

Je ne comprends pas pourquoi le xpath('//table[@class="quotes"]/tbody/tr') ne fonctionne pas.

Demandé el 7 de Avril, 2011 par mkt2012

Answer 1

1 Réponses

Answer 2

42voto

samplebias Points 19805

Vous regardez probablement le code HTML dans Firebug, n'est-ce pas ? Le navigateur insère la balise implicite <tbody> lorsqu'il n'est pas présent dans le document. La bibliothèque lxml ne traitera que les balises présentes dans la chaîne HTML brute.

Omettre le corps dans votre XPath. Par exemple, ceci fonctionne :

tree = lxml.html.fromstring(raw_html)
tree.xpath('//table[@class="quotes"]/tr')
[<Element tr at 1014206d0>, <Element tr at 101420738>, <Element tr at 1014207a0>]

Répondu el 7 de Avril, 2011 par samplebias (19805 Points )

Extraction de lxml xpath pour une table html

Réponse

Questions en vedette

Top Tags

Prograide.com

Powered by:

Extraction de lxml xpath pour une table html

Réponse

Questions en vedette

Top Tags

Dans notre réseau

Prograide.com

Powered by: