Problème d'interrogation d'un fichier HTML avec HTMLEditorKit en Java

Question

Problème d'interrogation d'un fichier HTML avec HTMLEditorKit en Java

Demandé el 5 de Novembre, 2009: Quand la question a-t-elle été
726 affichage: Nombre de visites la question a
2 Réponses: Nombre de réponses aux questions
Résolu: Situation réelle de la question

Mon HTML contient des balises de la forme suivante :

<div class="author"><a href="http://stackoverflow.com/user/1" title="View user profile.">Apple</a> - October 22, 2009 - 01:07</div>

J'aimerais extraire la date, "October 22, 2009 - 01:07" dans cet exemple, de chaque balise.

J'ai implémenté javax.swing.text.html.HTMLEditorKit.ParserCallback comme suit :

class HTMLParseListerInner extends HTMLEditorKit.ParserCallback {   
    private ArrayList<String> foundDates = new ArrayList<String>();
    private boolean isDivLink = false;

    public void handleText(char[] data, int pos) {
        if(isDivLink)
            foundDates.add(new String(data)); // Extracts "Apple" instead of the date.
    }

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {       
        String divValue = (String)a.getAttribute(HTML.Attribute.CLASS);
        if (t.toString() == "div" && divValue != null && divValue.equals("author"))
            isDivLink = true;
    }
}

Cependant, l'analyseur ci-dessus renvoie "Apple" qui se trouve à l'intérieur d'un lien hypertexte dans la balise. Comment puis-je corriger l'analyseur pour extraire la date ?

Demandé el 5 de Novembre, 2009 par reprogrammer

Answer 1

2 Réponses

Answer 2

0voto

Tom Hawtin - tackline Points 82671

Annulation handleEndTag et vérifier la présence de "a" ?

Cependant, cet analyseur HTML date du début des années 90 et ces méthodes ne sont pas bien spécifiées.

Répondu el 5 de Novembre, 2009 par Tom Hawtin - tackline (82671 Points )

Answer 3

0voto

camickr Points 137095

import java.io.*;
import java.util.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class ParserCallbackDiv extends HTMLEditorKit.ParserCallback
{
    private boolean isDivLink = false;
    private String divText;

    public void handleEndTag(HTML.Tag tag, int pos)
    {
        if (tag.equals(HTML.Tag.DIV))
        {
            System.out.println( divText );
            isDivLink = false;
        }
    }

    public void handleStartTag(HTML.Tag tag, MutableAttributeSet a, int pos)
    {
        if (tag.equals(HTML.Tag.DIV))
        {
            String divValue = (String)a.getAttribute(HTML.Attribute.CLASS);

            if ("author".equals(divValue))
                isDivLink = true;
        }
    }

    public void handleText(char[] data, int pos)
    {
        divText = new String(data);
    }

    public static void main(String[] args)
    throws IOException
    {
        String file = "<div class=\"author\"><a href=\"/user/1\"" +
            "title=\"View user profile.\">Apple</a> - October 22, 2009 - 01:07</div>";
        StringReader reader = new StringReader(file);

        ParserCallbackDiv parser = new ParserCallbackDiv();

        try
        {
            new ParserDelegator().parse(reader, parser, true);
        }
        catch (IOException e)
        {
            System.out.println(e);
        }
    }
}

Répondu el 5 de Novembre, 2009 par camickr (137095 Points )

Problème d'interrogation d'un fichier HTML avec HTMLEditorKit en Java

Réponses

Questions en vedette

Top Tags

Prograide.com

Powered by:

Problème d'interrogation d'un fichier HTML avec HTMLEditorKit en Java

Réponses

Questions en vedette

Top Tags

Dans notre réseau

Prograide.com

Powered by: