Introduction

ElementParser is lightweight framework to provide easy access to xml and html content. Rather than get lost in the complexities of the HTML and XML specifications, it aspires to not obscure their essential simplicity. It doesn’t do everything, it aspires to do “just enough”.

Let’s begin with some examples.

document = [Element parseHTML: source];

Document is a special element that holds the top level element(s) (e.g. <html> or <rss>) of your document. You now have a tree of Element objects which you can walk using methods like firstChild, nextSybling and parent. You can also access the data each contains with methods like tagName, attributes, contentsText and contentsOfChildren. Nice start. And sometimes this is enough. But let’s say you don’t want to walk the tree to find the data you need. How about:

linkElement = [element selectElement: @"div.nextLink a"];

Here we’re using a css-type selector to locate and return a matching element. Nice. Now we can parse a document and conveniently find elements of interest. (Yes, there is a corresponding selectElements: method that returns all matches.)

Next, let’s bind together your world of objects and the world of elements more closely. To do this, we’ll use the ElementParser directly to register callbacks into your code when an element is found (and its contents parsed).

ElementParser* parser = [[ElementParser alloc] initWithCallbacksDelegate: self];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"];
documentRoot = [parser parseXML: source];

Your code could then look like this:

-(FeedItem*) gotFeedElement:(Element*)element{
FeedItem* feedItem = [[[FeedItem alloc] init] autorelease];
feedItem.title = [[element selectElement: @"title"] contentsText];
feedItem.description = [[element selectElement: @"description"] contentsText];
feedItem.enclosure = [[element selectElement: @"enclosure"] contentsText];
element.setDomainObject = feedItem; //optional
}

Finally, all these html and xml documents often reside on the web. Wouldn’t it be nice if we could use the pattern above to process the documents incrementally as soon as they appear?

How about:

URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self];
[parser performSelector:@selector(gotChanElement:) forElementsMatching: @"channel"];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"];
[parser parseURL: myURL];

There is alot more available under the covers but this may be all you need. Hopefully its just enough. We’d love your feedback at feedback@touchtankapps.com.

 

Terms of Use

The ElementParser framework (and its source code) is free of charge for non commercial uses (via a GPL license). For other commercial uses, the license fee is $100 per product. (That’s a couple of hours of your time, right?) Support plans are also available. Please contact sales@touchtankapps.com.

77 Responses to “Introduction”

  1. How to display the data from a HTTPResponse onto the user interface?? - iPhone Dev SDK Forum Says:

    […] If it is XML, then there's the built-in NSXMLParser that you can use. If it is HTML, well, then you'll probably need to figure out how to parse it yourself. I saw some people recommend this but I don't know how good it is. Element Parser Touch Tank […]

  2. hamdouch Says:

    I downloaded the ElementParser xcode project but the icone build and go did’nt work and the iphone simulator did’nt appear ?
    how to use it?
    Is there an exemple that helps me to see how it parse the html content?

  3. hamdouch Says:

    Sorry i’m a beginner in xcode so it looks hard for me to understand .I have a string str with html content and i want to parse this content by only keeping a paragraph between et so how to proceed ?
    I hope i did’nt bother you 😉

    • touchtank Says:

      Its not clear exactly what you are trying to accomplish. As an example, to remove the text content of the first ‘p’ tag in an html document whose source is in ‘sourceStr’ you could do the following:

      DocumentRoot* doc = [Element parseHTML: sourceStr];
      Element* firstParaElement = [doc selectElement: @"p"];
      return [firstParaElement contentsText];

  4. hamdouch Says:

    Can we display this text content in a webview and how to proceed if it is possible?
    Thanks a lot for your help

  5. Musca Law Says:

    How do I reset my password?
    Thanks
    Musca Law
    Musca Law

  6. PPGer Says:

    New to xcode, what is the proper settings to build the ElementParser for iPhone 3.0 (i.e. do we build under the 3.0 sdk)?

    How do you link it into your iPhone project. I added as a frame work but get a warning “libElementParser.a file is not of required architecture”, I built for iPhone 3.0 – release and added the file to my iPhone test project as a framework.

  7. Mike Ferrier Says:

    Howdy,

    I’ve been trying out ElementParser for an iPhone project, and it seems to be exactly what I need. One issue, though, I seem to have uncovered a bug with entity parsing. Check out the files here to see what I mean: http://github.com/mferrier/elementparserbug

    If you scroll to near the end of the parser output (http://github.com/mferrier/elementparserbug/blob/master/elementparserbug-output.txt) you can see that the last chunk processed is “03 — sugar “, which appears right before
    ” & spice;” in the html source (http://github.com/mferrier/elementparserbug/blob/master/elementparserbug.html). I’m thinking maybe it’s mis-processing the ampersand and semicolon as an entitiy or something?

    Anyway, if I can get these bugs ironed out, I’d definitely be interested in licensing this library for this upcoming iPhone app.

    Send me an email or reply here if you’d like any further information.

    Thanks!

    Mike

  8. Reza Says:

    Hi,

    Thanks for this. It seems that selectElements returns an array of Elements that do not have the textContents property set to hold the actual contents of the Element. Is this by design? If so, how do I get the content of an element returned by selectElements?

    Thanks,

    Reza

  9. Reza Says:

    sorry, maybe the above is untrue… it had no contents but had attributes since it was an img tag.

    Thanks!

  10. Pedro Says:

    Hello,

    I’m trying to use ElementParser, but i have some doubts about all the functionality.
    Let’s say i parse an HTML string, and select a element. I want to delete this element, and then recreate the HTML file without it, is this possible?

    DocumentRoot* doc = [Element parseHTML: sourceStr];
    Element* tableElement = [doc selectElement: @”table”];
    //remove the tableElement from the doc
    NSString *result = // doc to HTML ?

    • touchtank Says:

      ElementParser does not implement a full DOM style interface or data model. As a result, you cannot emit a revised version of the document. It is focused on the problem of reading html/xml and easily accessing content in the document.

    • touchtank Says:

      Pedro,

      ElementParser does not include a serialization routine currently. It is focused on the reading use cases. Sorry.

    • Jeff Says:

      I have not tried this, but here’s an idea:
      1. Locate the “table” element you want to omit.
      2. Get the NSRange of the element.
      3. Using the NSString methods, copy&merge the ranges (of the document source code) before and after (but excluding) the element into a new NSString.

      Sorry, but I don’t have example code. This is off the top of my head. But I could write something up if needed.

      HTH

  11. HTML parser for Mac and iPhone/iPod touch « JongAm’s blog Says:

    […] of a good parse I found is Element Parser and its source codes is host at GitHub. However there is no good explanation about how to use it. […]

  12. CrerryDef Says:

    Amazing, I didn’t heard about that up to now. Thx!

  13. hop Says:

    Is this ok if i add “Classes” folder to my project? Is this enough to work? Thanks.

  14. swoc Says:

    Hey guys, new to xcode. I need to display content of t HTML table in a part of my iPhone app. However it is just for reference and I dont really need it to parse in a particular way. Is there any way I can just display how the web site is shown straight into my app??

    • touchtank Says:

      Take a look at UIWebView for a full-fledged browers view. Or, take a look at TTStyledText from Three20, a third party library.

  15. Gavin Says:

    I am getting a memory leak every time I use selectElement.
    It says tagChunk and NSCFString are leaking 32 and 16 bytes respectfully. Am I doing something wrong or is there a fix for this if its a problem

    • touchtank Says:

      Anyone else seeing this?

      • NEogene Says:

        Me too, the problem is linked to TxtChunk* text = [[TxtChunk alloc] initWithString:source range: NSMakeRange(0,0)];

        into NSString_HTML @ row 393

        If i’m not wrong it happens during the white loop following the allocation of the txtchunk..

      • Neogene Says:

        LEAKS FIXED!

        The problem was due to a forgotten release for the property named “lastChunk”.

        Add the following line in the dealloc method of ElementParser CLASS:

        [lastChunk release];

        ie:

        -(void)dealloc{
        [tagStack release];
        [root release];

        [lastChunk release];

        if (callbackMethods){
        CFRelease(callbackMethods);
        [callbackMatchers release];
        }
        [super dealloc];
        }

  16. artful Says:

    This looks like exactly what im looking for however i can’t get it to work. Im trying to parse some html from the web and just cant get it to work properly.

    Could you post an example of the .h .m file of this working? I’ve literally been searching the web for months to find something like this. Any help would be much appreciated. Thanks.

  17. artful Says:

    I’m just trying to figure out how to use the URLParser: i.e. i’m implementing the below in the viewDidLoad method:

    URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self];
    [parser performSelector:@selector(gotChanElement:) forElementsMatching: @”channel”];
    [parser performSelector:@selector(gotFeedElement:) forElementsMatching: @”feed”];
    [parser parseURL:[NSURL URLWithString:@”http://www.google.ie/”]];

    The source or the result of the webpage is not showing up when i run the app.

    I’m only coding 6 months so i wouldnt be surprised if im doing something very simple very wrong! Any feedback would be appreciated.

  18. Adam Says:

    Hi TouchTank,

    First off just wanna say what a great tool youve made in ElementParser. I was up and parsing in less than 5 minutes, its incredibly easy to use, and has all the functionality I need.

    I want to buy a license from you, but i get a Mail Delivery error when I email sales@touchtankapps.com

    Please Advise.

    Thanks

  19. won Says:

    Thank you very much .

    I’d used your great classes very well and help me out.
    But I found memory leak when I call the ‘[Element parseHTML: @”testStr”]’ method in a loop of my program.I checked the sources and added the following code to the dealloc method of ElementParser.m class file,and it seems well.
    [lastOpened release];
    [lastClosedBeforeOpen release];
    [lastChunk release];
    I’m very sorry for my poor english,didn’t write in english for about 8 years.not sure about I described it clearly.

  20. Aral Balkan Says:

    I also tried to get in touch to get a commercial license and could not get an answer.

    Is there any chance we could get this open-sourced?

  21. Patrick Says:

    This seems to be a good solution for a problem I have but I would just like to know if development is continuing. There don’t seem to have been any updates on GitHub since last year. Is there any news?

    • touchtank Says:

      Patrick,

      Development has slowed but it is still supported. I’d love to hear about your experiences with it…

      • Patrick Says:

        That’s great. I am just testing out Element Parser, I will let you know how it goes.

        Patrick

  22. Parvez Qureshi Says:

    How can we use ElementParser to be able to read external CSS files which are being referenced using link tag inside the html file?

    • touchtank Says:

      Parvez, the parser doesn’t process such link tags as such. You can ref the HREF attribute yourself to obtain the file but Element Pareser won’t parse the CSS file… Sorry!

  23. charlie Says:

    Hi, I am new to this iphone programming can any one help me how to get html contents with tags, now i am getting only contents in an array as string objects, is it possible to get the array contents as a key/value pairs.

    • touchtank Says:

      Charlie, this is just what ElementParser does. Take a look at the docs and you shohuod find examples on how to do what you want.

  24. Peter Says:

    This was so great!

    I got what I wanted with only two lines of code! Beats all of the Parsers I’ve tried in Objective C!

    DocumentRoot *document = [Element parseHTML:html];
    Element *element = [document selectElement: @”div#column2-3 div p”];

    Thank you so much from Sweden!

  25. Stephen Says:

    @Peter

    Can you post the html code your parsed?

  26. Peter Says:

    @Stephen

    It’s simple if you look at my code. It is a div with id=”column2-3″, inside that div there is another div, and inside the second div is a p-element.

    It’s very easy to understand if you read the whole article.

  27. Norm Says:

    Great library! I got a “connection refused” error when i emailed sales@touchtankapps … just left a VM at the # mentioned earlier in this post. Would love to purchase a commercial license!

  28. Gigi Says:

    Hi,
    I’m new to objective C but I’m using this library to perform some test on the HTML parsing.
    I declared a string in this way:
    string1 = “the text”;
    then:
    DocumentRoot* document = [Element parseHTML: string1];
    Element* theElement = [document selectElement: @”p”];

    The problem is that the “theElement” contentsText is NIL, while the contentsLenght is 8.
    ANy ideas??

    Thank you

  29. Gigi Says:

    Comments accept html code…
    the string1 is: @”-html- -div- -p- the text -/div- -/p- -/html”;
    I used – insted of .
    Hope this is clear. Sorry guys

  30. touchtank Says:

    hmmm, as written the tags don’t nest… was that a mistake in the comment or the source?

    • Gigi Says:

      Problem solved, but I have another question for you.
      For example, in my html code there are 3 divs, I want to get a collection containing the 3 elements. Can you explain me how should I do.

      Thanks

  31. Gigi Says:

    Only in my comments.
    the string1 is: @”-html- -div- -p- the text -/p- -/div- -/html”;

    Thanks

  32. sickboy Says:

    Hi,
    I have a problem with the special character “ (it’s not the normal brackets “) in the content text of the div class=”news”.
    This is my code:
    DocumentRoot* document = [Element parseHTML: result];
    NSArray* divs = [document selectElements: @”divs.news”];

    It works fine until it found the “. If I remove it from the text everything works fine.

    Have you the same problem?

  33. cdor Says:

    Hi,
    I’m new to objective-c, but I’m trying to use the element parser to obtain the contents of a very simple web page that looks like this:

    some data
    more data

    I only want to get the “some data” and “more data” and display it on my application. Can you help me to get started with this?

    Thank you!

  34. cdor Says:

    I’m sorry I don’t know why my previous post did not show correctly.
    This is how the HTML contents should look like:

    some data
    more data

    • cdor Says:

      ok, I’m not sure why the html tags are not displaying, but basically there is an html tag and then head tag with a meta tag, then comes the data, and at the end the html tag is closed.
      I’m sorry for posting the same twice.

      • touchtank Says:

        I am going to assume your html looks something like this:

        <html>
        <head>
        <meta foo='some data'>more data</meta>
        </head>
        </html>

        That being the case. Your code could look something like this:

        DocumentRoot* document = [Element parseHTML: htmlString];
        Element* meta = [document selectElement: @"meta"];
        NSString* fooAttr = [meta attribute: @"foo"];
        NSString* contents = [meta contentsText];

        I hope this helps.

  35. niejam Says:

    how to select certain row in table such this:
    /TABLE/TBODY/TR[2]/TD[2]

    Thanks.

    • touchtank Says:

      If there is a distinguishing class (or id) you could select the tds directly:

      NSArray* tds = [document selectElements: @"td.some_class"];

      If not, you can use the ‘+’ syntax:

      NSArray* tds = [document selectElements: @"table tbody tr+tr td+td"];

      Note the ‘+’ means immediately following…

      Note: the table and tbody may be superfluous.

      Hope this helps

    • Antony Morozov Says:

      Simply method:

      char *_td[5];
      NSString* tds(), td;

      for (int i; i<5; i++) {
      _td[i] = [[document selectElement: tds()] contentsText];
      }

      NSString* tds()
      {
      td = [td stringByAppendingString:@"td +"];
      return td ;
      }

  36. zobertke Says:

    Can I use the ElementParser as a full text parser for bunch of htm files? I need to write an app which is able to find and display the list of all the html files (diary pages) in which a searched string is present. We are talking here about several thousands of html files. Searching time is critical.I was wondering if I can skip a solution with indexed search.
    Thanks for your answer

  37. Issam Says:

    This worked great for me, however I am having encoding issues as Arabic characters are bei encoded as Unicode escaped \uxxxx

    How to get this fixed

    • Issam Says:

      It ends up the problem with the way strings are being added to a dictionary…

      I am doing [dictionary setValue:[content contentsText] forKey:myKey]

      [content contentsText] is now being unicode escaped in the dictionary …

      This is my problem, nothing to do with elementparser

  38. mark Says:

    Is there a way to get around this warning i get..

    “WARN [index: 257]: document left tag open “

  39. nishant Says:

    Can i set the value for the particular HTML element ? For example,when i load the web page in my UIWebView which has Login and Password field. I want these fields get filled when i load the page. So for that i need to get the text field element first by parsing the whole page and then i need to set the value for these elements. How can i set the value ? Or is it for the Read Only ?

  40. Sho Says:

    Hi,
    Thanks for the nice parser. How would I parse, say a php file or a web page on a domain? Cause using the demo, when I put in the source as domain.com/index.php It didnt work.

    Any idea?

    • touchtank Says:

      You can either get the source of the page yourself and then hand it to the parser, or use URLParser to have it fetch it automatically. Not sure where you are going astray…

  41. Mike Zang Says:

    Can you just update it to use ARC or in IOS6?


Leave a comment