Comment utiliser le pack HTML Agility

Question

Comment utiliser le pack HTML Agility

Demandé el 11 de Mai, 2009: Quand la question a-t-elle été
153496 affichage: Nombre de visites la question a
5 Réponses: Nombre de réponses aux questions
Résolu: Situation réelle de la question

Mon document XHTML n'est pas complètement valide. C'est pourquoi je voulais l'utiliser. Comment puis-je l'utiliser dans mon projet ? Mon projet est en C#.

Demandé el 11 de Mai, 2009 par Utilisateur non enregistré

Answer 1

5 Réponses

Answer 2

365voto

Ash Points 31541

Télécharger et construire la solution HTMLAgilityPack.
Dans votre application, ajoutez une référence à HTMLAgilityPack.dll dans le dossier HTMLAgilityPack. \Debug (ou Realease) \bin dossier.

Alors, à titre d'exemple :

    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

    // There are various options, set as needed
    htmlDoc.OptionFixNestedTags=true;

    // filePath is a path to a file containing the html
    htmlDoc.Load(filePath);

    // Use:  htmlDoc.LoadHtml(xmlString);  to load from a string (was htmlDoc.LoadXML(xmlString)

   // ParseErrors is an ArrayList containing any errors from the Load statement
   if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
   {
       // Handle any parse errors as required

   }
   else
   {

        if (htmlDoc.DocumentNode != null)
        {
            HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

            if (bodyNode != null)
            {
                // Do something with bodyNode
            }
        }
    }

(NB : Ce code n'est qu'un exemple et ne constitue pas nécessairement la meilleure/seule approche. Ne l'utilisez pas aveuglément dans votre propre application).

La méthode HtmlDocument.Load() accepte également un flux, ce qui est très utile pour l'intégration avec d'autres classes orientées flux dans le cadre .NET. Tandis que HtmlEntity.DeEntitize() est une autre méthode utile pour traiter correctement les entités html. (merci Matthew)

HtmlDocument et HtmlNode sont les classes que vous utiliserez le plus. Semblables à un analyseur XML, elles fournissent les méthodes selectSingleNode et selectNodes qui acceptent les expressions XPath.

Faites attention aux propriétés booléennes de HtmlDocument.Option ??????. Elles contrôlent la manière dont les méthodes Load et LoadXML traiteront votre HTML/XHTML.

Il existe également un fichier d'aide compilé appelé HtmlAgilityPack.chm qui contient une référence complète pour chacun des objets. Il se trouve normalement dans le dossier de base de la solution.

Répondu el 11 de Mai, 2009 par Ash (31541 Points )

Answer 3

167voto

rtpHarry Points 5306

Je ne sais pas si cela vous sera utile, mais j'ai rédigé deux articles qui présentent les principes de base.

Le prochain article est terminé à 95%, il ne me reste plus qu'à rédiger les explications des dernières parties du code que j'ai écrit. Si vous êtes intéressés, j'essaierai de me rappeler de poster ici quand je le publierai.

Répondu el 6 de Avril, 2010 par rtpHarry (5306 Points )

Answer 4

66voto

Kent Munthe Caspersen Points 642

HtmlAgilityPack utilise la syntaxe XPath, et bien que beaucoup prétendent qu'elle est mal documentée, je n'ai eu aucun problème à l'utiliser avec l'aide de cette documentation XPath : http://www.w3schools.com/xpath/xpath_syntax.asp

Pour analyser

<h2>
  <a href="">Jack</a>
</h2>
<ul>
  <li class="tel">
    <a href="">81 75 53 60</a>
  </li>
</ul>
<h2>
  <a href="">Roy</a>
</h2>
<ul>
  <li class="tel">
    <a href="">44 52 16 87</a>
  </li>
</ul>

J'ai fait ça :

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//h2//a"))
{
  names.Add(node.ChildNodes[0].InnerHtml);
}
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//li[@class='tel']//a"))
{
  phones.Add(node.ChildNodes[0].InnerHtml);
}

Répondu el 8 de Juillet, 2013 par Kent Munthe Caspersen (642 Points )

Answer 5

6voto

captainsac Points 468

Consultez le code complet à l'adresse suivante http://www.dotnetlines.com/Blogs/tabid/85/EntryId/38/Get-Facebook-like-Page-Title-and-Meta-Description-of-other-site-using-HTMLAgilityPack.aspx

Le code principal lié à HTMLAgilityPack est le suivant

using System;

using System.Net;

using System.Web;

using System.Web.Services;

using System.Web.Script.Services;

using System.Text.RegularExpressions;

using HtmlAgilityPack;

namespace GetMetaData

{

/// <summary>

/// Summary description for MetaDataWebService

/// </summary>

[WebService(Namespace = "http://tempuri.org/")]

[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]

[System.ComponentModel.ToolboxItem(false)]

// To allow this Web Service to be called from script, using ASP.NET AJAX, uncomment the following line.

[System.Web.Script.Services.ScriptService]

public class MetaDataWebService : System.Web.Services.WebService

{

    [WebMethod]

    [ScriptMethod(UseHttpGet = false)]

    public MetaData GetMetaData(string url)

    {

        MetaData objMetaData = new MetaData();

        //Get Title

        WebClient client = new WebClient();

        string sourceUrl = client.DownloadString(url);

        objMetaData.PageTitle = Regex.Match(sourceUrl, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value;

        //Method to get Meta Tags

        objMetaData.MetaDescription = GetMetaDescription(url);

        return objMetaData;

    }

    private string GetMetaDescription(string url)

    {

        string description = string.Empty;

        //Get Meta Tags

        var webGet = new HtmlWeb();

        var document = webGet.Load(url);

        var metaTags = document.DocumentNode.SelectNodes("//meta");

        if (metaTags != null)

        {

            foreach (var tag in metaTags)

            {

                if (tag.Attributes["name"] != null && tag.Attributes["content"] != null && tag.Attributes["name"].Value.ToLower() == "description")

                {

                    description = tag.Attributes["content"].Value;

                }

            }

        }

        else

        {

            description = string.Empty;

        }

        return description;

    }

}

}

Voir plus à : http://www.dotnetlines.com/Blogs/tabid/85/EntryId/38/Get-Facebook-like-Page-Title-and-Meta-Description-of-other-site-using-HTMLAgilityPack.aspx#sthash.XoWtzJLb.dpuf

Répondu el 2 de Août, 2014 par captainsac (468 Points )

Answer 6

5voto

ibrahim ozboluk Points 67

    public string HtmlAgi(string url, string key)
    {

        var Webget = new HtmlWeb();
        var doc = Webget.Load(url);
        HtmlNode ourNode = doc.DocumentNode.SelectSingleNode(string.Format("//meta[@name='{0}']", key));

        if (ourNode != null)
        {

                return ourNode.GetAttributeValue("content", "");

        }
        else
        {
            return "not fount";
        }

    }

Répondu el 6 de Décembre, 2013 par ibrahim ozboluk (67 Points )

Comment utiliser le pack HTML Agility

Réponses

Questions en vedette

Top Tags

Prograide.com

Powered by:

Comment utiliser le pack HTML Agility

Réponses

Questions en vedette

Top Tags

Dans notre réseau

Prograide.com

Powered by: