Introduction

ElementParser is lightweight framework to provide easy access to xml and html content. Rather than get lost in the complexities of the HTML and XML specifications, it aspires to not obscure their essential simplicity. It doesn’t do everything, it aspires to do “just enough”.

Let’s begin with some examples.

document = [Element parseHTML: source];

Document is a special element that holds the top level element(s) (e.g. <html> or <rss>) of your document. You now have a tree of Element objects which you can walk using methods like firstChild, nextSybling and parent. You can also access the data each contains with methods like tagName, attributes, contentsText and contentsOfChildren. Nice start. And sometimes this is enough. But let’s say you don’t want to walk the tree to find the data you need. How about:

linkElement = [element selectElement: @"div.nextLink a"];

Here we’re using a css-type selector to locate and return a matching element. Nice. Now we can parse a document and conveniently find elements of interest. (Yes, there is a corresponding selectElements: method that returns all matches.)

Next, let’s bind together your world of objects and the world of elements more closely. To do this, we’ll use the ElementParser directly to register callbacks into your code when an element is found (and its contents parsed).

ElementParser* parser = [[ElementParser alloc] initWithCallbacksDelegate: self]; [parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"]; documentRoot = [parser parseXML: source];

Your code could then look like this:

-(FeedItem*) gotFeedElement:(Element*)element{ FeedItem* feedItem = [[[FeedItem alloc] init] autorelease]; feedItem.title = [[element selectElement: @"title"] contentsText]; feedItem.description = [[element selectElement: @"description"] contentsText]; feedItem.enclosure = [[element selectElement: @"enclosure"] contentsText]; element.setDomainObject = feedItem; //optional }

Finally, all these html and xml documents often reside on the web. Wouldn’t it be nice if we could use the pattern above to process the documents incrementally as soon as they appear?

How about:

URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self]; [parser performSelector:@selector(gotChanElement:) forElementsMatching: @"channel"]; [parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"]; [parser parseURL: myURL];

There is alot more available under the covers but this may be all you need. Hopefully its just enough. We’d love your feedback at feedback@touchtankapps.com.

Terms of Use

The ElementParser framework (and its source code) is free of charge for non commercial uses (via a GPL license). For other commercial uses, the license fee is $100 per product. (That’s a couple of hours of your time, right?) Support plans are also available. Please contact sales@touchtankapps.com.

77 Comments »

How to display the data from a HTTPResponse onto the user interface?? - iPhone Dev SDK Forum Says:
August 3, 2009 at 11:50 pm

[…] If it is XML, then there's the built-in NSXMLParser that you can use. If it is HTML, well, then you'll probably need to figure out how to parse it yourself. I saw some people recommend this but I don't know how good it is. Element Parser Touch Tank […]

hamdouch Says:
August 17, 2009 at 5:50 am

I downloaded the ElementParser xcode project but the icone build and go did’nt work and the iphone simulator did’nt appear ?
how to use it?
Is there an exemple that helps me to see how it parse the html content?

touchtank Says:
August 20, 2009 at 8:50 am
It is not intended to be a stand alone project. Did you see the docs with usage examples?

- hamdouch Says:
  August 23, 2009 at 7:49 pm
  No i did’nt see them .
  Can you please send me an exemple to know really how to use it
touchtank Says:
August 24, 2009 at 11:17 am
Take a look at:

https://touchtank.wordpress.com/element-parser/

Let me know if you still have questions…

hamdouch Says:
August 25, 2009 at 4:10 am

Sorry i’m a beginner in xcode so it looks hard for me to understand .I have a string str with html content and i want to parse this content by only keeping a paragraph between et so how to proceed ?
I hope i did’nt bother you 😉

touchtank Says:
August 25, 2009 at 3:31 pm
Its not clear exactly what you are trying to accomplish. As an example, to remove the text content of the first ‘p’ tag in an html document whose source is in ‘sourceStr’ you could do the following:

DocumentRoot* doc = [Element parseHTML: sourceStr]; Element* firstParaElement = [doc selectElement: @"p"]; return [firstParaElement contentsText];

hamdouch Says:
August 26, 2009 at 5:36 am

Can we display this text content in a webview and how to proceed if it is possible?
Thanks a lot for your help

Musca Law Says:
September 6, 2009 at 4:34 am

How do I reset my password?
Thanks
Musca Law
Musca Law

touchtank Says:
September 9, 2009 at 10:32 am
You don’t add it as a framework… you just add the source files… framework support on the iphone is still tricky…

- touchtank Says:
  September 9, 2009 at 10:33 am
  For SourceForge? I am not sure… anybody else know?

PPGer Says:
September 8, 2009 at 10:31 am

New to xcode, what is the proper settings to build the ElementParser for iPhone 3.0 (i.e. do we build under the 3.0 sdk)?

How do you link it into your iPhone project. I added as a frame work but get a warning “libElementParser.a file is not of required architecture”, I built for iPhone 3.0 – release and added the file to my iPhone test project as a framework.

Mike Ferrier Says:
September 27, 2009 at 10:58 pm

Howdy,

I’ve been trying out ElementParser for an iPhone project, and it seems to be exactly what I need. One issue, though, I seem to have uncovered a bug with entity parsing. Check out the files here to see what I mean: http://github.com/mferrier/elementparserbug

If you scroll to near the end of the parser output (http://github.com/mferrier/elementparserbug/blob/master/elementparserbug-output.txt) you can see that the last chunk processed is “03 — sugar “, which appears right before
” & spice;” in the html source (http://github.com/mferrier/elementparserbug/blob/master/elementparserbug.html). I’m thinking maybe it’s mis-processing the ampersand and semicolon as an entitiy or something?

Anyway, if I can get these bugs ironed out, I’d definitely be interested in licensing this library for this upcoming iPhone app.

Send me an email or reply here if you’d like any further information.

Thanks!

Mike

touchtank Says:
October 4, 2009 at 10:40 am
Fix is now posted.

Reza Says:
October 1, 2009 at 7:50 am

Hi,

Thanks for this. It seems that selectElements returns an array of Elements that do not have the textContents property set to hold the actual contents of the Element. Is this by design? If so, how do I get the content of an element returned by selectElements?

Thanks,

Reza

Reza Says:
October 1, 2009 at 8:45 am

sorry, maybe the above is untrue… it had no contents but had attributes since it was an img tag.

Pedro Says:
October 4, 2009 at 3:20 pm

Hello,

I’m trying to use ElementParser, but i have some doubts about all the functionality.
Let’s say i parse an HTML string, and select a element. I want to delete this element, and then recreate the HTML file without it, is this possible?

DocumentRoot* doc = [Element parseHTML: sourceStr];
Element* tableElement = [doc selectElement: @”table”];
//remove the tableElement from the doc
NSString *result = // doc to HTML ?

touchtank Says:
October 5, 2009 at 9:28 am
ElementParser does not implement a full DOM style interface or data model. As a result, you cannot emit a revised version of the document. It is focused on the problem of reading html/xml and easily accessing content in the document.

touchtank Says:
May 28, 2010 at 8:25 am
Pedro,

ElementParser does not include a serialization routine currently. It is focused on the reading use cases. Sorry.

Jeff Says:
February 7, 2013 at 1:23 am
I have not tried this, but here’s an idea:
1. Locate the “table” element you want to omit.
2. Get the NSRange of the element.
3. Using the NSString methods, copy&merge the ranges (of the document source code) before and after (but excluding) the element into a new NSString.

Sorry, but I don’t have example code. This is off the top of my head. But I could write something up if needed.

HTH

HTML parser for Mac and iPhone/iPod touch « JongAm’s blog Says:
October 10, 2009 at 7:05 pm

[…] of a good parse I found is Element Parser and its source codes is host at GitHub. However there is no good explanation about how to use it. […]

CrerryDef Says:
November 25, 2009 at 3:13 pm

Amazing, I didn’t heard about that up to now. Thx!

hop Says:
January 13, 2010 at 4:55 pm

Is this ok if i add “Classes” folder to my project? Is this enough to work? Thanks.

touchtank Says:
January 13, 2010 at 4:58 pm
that should be it.

swoc Says:
January 19, 2010 at 11:44 am

Hey guys, new to xcode. I need to display content of t HTML table in a part of my iPhone app. However it is just for reference and I dont really need it to parse in a particular way. Is there any way I can just display how the web site is shown straight into my app??

touchtank Says:
January 20, 2010 at 8:47 am
Take a look at UIWebView for a full-fledged browers view. Or, take a look at TTStyledText from Three20, a third party library.

Gavin Says:
February 9, 2010 at 6:40 am

I am getting a memory leak every time I use selectElement.
It says tagChunk and NSCFString are leaking 32 and 16 bytes respectfully. Am I doing something wrong or is there a fix for this if its a problem

touchtank Says:
May 28, 2010 at 8:25 am
Anyone else seeing this?

- NEogene Says:
  September 8, 2010 at 5:30 pm
  Me too, the problem is linked to TxtChunk* text = [[TxtChunk alloc] initWithString:source range: NSMakeRange(0,0)];
  
  into NSString_HTML @ row 393
  
  If i’m not wrong it happens during the white loop following the allocation of the txtchunk..
- Neogene Says:
  September 8, 2010 at 5:42 pm
  LEAKS FIXED!
  
  The problem was due to a forgotten release for the property named “lastChunk”.
  
  Add the following line in the dealloc method of ElementParser CLASS:
  
  [lastChunk release];
  
  ie:
  
  -(void)dealloc{
  [tagStack release];
  [root release];
  
  [lastChunk release];
  
  if (callbackMethods){
  CFRelease(callbackMethods);
  [callbackMatchers release];
  }
  [super dealloc];
  }

artful Says:
March 3, 2010 at 5:45 am

This looks like exactly what im looking for however i can’t get it to work. Im trying to parse some html from the web and just cant get it to work properly.

Could you post an example of the .h .m file of this working? I’ve literally been searching the web for months to find something like this. Any help would be much appreciated. Thanks.

touchtank Says:
March 3, 2010 at 8:36 am
I am confused… did you see the code samples at https://touchtank.wordpress.com/element-parser/ ? What errors are you getting?

artful Says:
March 3, 2010 at 10:24 am

I’m just trying to figure out how to use the URLParser: i.e. i’m implementing the below in the viewDidLoad method:

URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self];
[parser performSelector:@selector(gotChanElement:) forElementsMatching: @”channel”];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @”feed”];
[parser parseURL:[NSURL URLWithString:@”http://www.google.ie/”]];

The source or the result of the webpage is not showing up when i run the app.

I’m only coding 6 months so i wouldnt be surprised if im doing something very simple very wrong! Any feedback would be appreciated.

touchtank Says:
March 3, 2010 at 10:36 am
http://www.google.ie/ is not a feed url… it is a web page. So, it doesn’t have channel and item tags…

Adam Says:
March 18, 2010 at 5:26 pm

Hi TouchTank,

First off just wanna say what a great tool youve made in ElementParser. I was up and parsing in less than 5 minutes, its incredibly easy to use, and has all the functionality I need.

I want to buy a license from you, but i get a Mail Delivery error when I email sales@touchtankapps.com

Please Advise.

Thanks

won Says:
April 7, 2010 at 8:05 am

Thank you very much .

I’d used your great classes very well and help me out.
But I found memory leak when I call the ‘[Element parseHTML: @”testStr”]’ method in a loop of my program.I checked the sources and added the following code to the dealloc method of ElementParser.m class file,and it seems well.
[lastOpened release];
[lastClosedBeforeOpen release];
[lastChunk release];
I’m very sorry for my poor english,didn’t write in english for about 8 years.not sure about I described it clearly.

touchtank Says:
April 7, 2010 at 11:25 pm
Thank you for your fixes. We will review and include as appropriate.

Aral Balkan Says:
April 7, 2010 at 12:41 pm

I also tried to get in touch to get a commercial license and could not get an answer.

Is there any chance we could get this open-sourced?

touchtank Says:
April 7, 2010 at 11:27 pm
Aral, i am sorry that you had trouble obtaining a license. Please contact me at 919-442-8252. Lee

Patrick Says:
July 9, 2010 at 8:43 am

This seems to be a good solution for a problem I have but I would just like to know if development is continuing. There don’t seem to have been any updates on GitHub since last year. Is there any news?

touchtank Says:
July 10, 2010 at 1:36 pm
Patrick,

Development has slowed but it is still supported. I’d love to hear about your experiences with it…

- Patrick Says:
  July 11, 2010 at 12:02 pm
  That’s great. I am just testing out Element Parser, I will let you know how it goes.
  
  Patrick

Parvez Qureshi Says:
August 23, 2010 at 6:41 am

How can we use ElementParser to be able to read external CSS files which are being referenced using link tag inside the html file?

touchtank Says:
August 23, 2010 at 7:46 am
Parvez, the parser doesn’t process such link tags as such. You can ref the HREF attribute yourself to obtain the file but Element Pareser won’t parse the CSS file… Sorry!

charlie Says:
September 1, 2010 at 2:08 am

Hi, I am new to this iphone programming can any one help me how to get html contents with tags, now i am getting only contents in an array as string objects, is it possible to get the array contents as a key/value pairs.

touchtank Says:
September 3, 2010 at 6:46 am
Charlie, this is just what ElementParser does. Take a look at the docs and you shohuod find examples on how to do what you want.

- charlie Says:
  September 3, 2010 at 2:26 pm
  Hi ,
  
  I got solution for this , parser will give the element objects array , by this we can takeout whatever we want,
  
  Thanks.

Peter Says:
November 6, 2010 at 6:44 pm

This was so great!

I got what I wanted with only two lines of code! Beats all of the Parsers I’ve tried in Objective C!

DocumentRoot *document = [Element parseHTML:html];
Element *element = [document selectElement: @”div#column2-3 div p”];

Thank you so much from Sweden!

touchtank Says:
November 9, 2010 at 7:56 am
Thanks, glad it worked. Please spread the news.

Stephen Says:
November 12, 2010 at 9:40 am

@Peter

Can you post the html code your parsed?

Peter Says:
November 12, 2010 at 11:00 am

@Stephen

It’s simple if you look at my code. It is a div with id=”column2-3″, inside that div there is another div, and inside the second div is a p-element.

It’s very easy to understand if you read the whole article.

Norm Says:
November 17, 2010 at 3:35 pm

Great library! I got a “connection refused” error when i emailed sales@touchtankapps … just left a VM at the # mentioned earlier in this post. Would love to purchase a commercial license!

touchtank Says:
December 1, 2010 at 5:15 pm
Sorry about the bad sales link. Now fixed.

Gigi Says:
December 22, 2010 at 7:03 pm

Hi,
I’m new to objective C but I’m using this library to perform some test on the HTML parsing.
I declared a string in this way:
string1 = “the text”;
then:
DocumentRoot* document = [Element parseHTML: string1];
Element* theElement = [document selectElement: @”p”];

The problem is that the “theElement” contentsText is NIL, while the contentsLenght is 8.
ANy ideas??

Thank you

Gigi Says:
December 22, 2010 at 7:05 pm

Comments accept html code…
the string1 is: @”-html- -div- -p- the text -/div- -/p- -/html”;
I used – insted of .
Hope this is clear. Sorry guys

touchtank Says:
December 22, 2010 at 7:28 pm

hmmm, as written the tags don’t nest… was that a mistake in the comment or the source?

Gigi Says:
December 23, 2010 at 8:04 pm
Problem solved, but I have another question for you.
For example, in my html code there are 3 divs, I want to get a collection containing the 3 elements. Can you explain me how should I do.

Thanks

- touchtank Says:
  December 24, 2010 at 8:53 am
  NSArray* theDivs = [document selectElements: @”div”];

Gigi Says:
December 22, 2010 at 8:47 pm

Only in my comments.
the string1 is: @”-html- -div- -p- the text -/p- -/div- -/html”;

sickboy Says:
January 13, 2011 at 12:32 pm

Hi,
I have a problem with the special character “ (it’s not the normal brackets “) in the content text of the div class=”news”.
This is my code:
DocumentRoot* document = [Element parseHTML: result];
NSArray* divs = [document selectElements: @”divs.news”];

It works fine until it found the “. If I remove it from the text everything works fine.

Have you the same problem?

cdor Says:
January 30, 2011 at 1:04 pm

Hi,
I’m new to objective-c, but I’m trying to use the element parser to obtain the contents of a very simple web page that looks like this:

some data
more data

I only want to get the “some data” and “more data” and display it on my application. Can you help me to get started with this?

Thank you!

cdor Says:
January 30, 2011 at 1:24 pm

I’m sorry I don’t know why my previous post did not show correctly.
This is how the HTML contents should look like:

cdor Says:
January 30, 2011 at 1:27 pm
ok, I’m not sure why the html tags are not displaying, but basically there is an html tag and then head tag with a meta tag, then comes the data, and at the end the html tag is closed.
I’m sorry for posting the same twice.

- touchtank Says:
  January 30, 2011 at 6:27 pm
  I am going to assume your html looks something like this:
  
  <html> <head> <meta foo='some data'>more data</meta> </head> </html>
  That being the case. Your code could look something like this:
  
  DocumentRoot* document = [Element parseHTML: htmlString]; Element* meta = [document selectElement: @"meta"]; NSString* fooAttr = [meta attribute: @"foo"]; NSString* contents = [meta contentsText];
  
  I hope this helps.

niejam Says:
April 24, 2011 at 12:49 pm

how to select certain row in table such this:
/TABLE/TBODY/TR[2]/TD[2]

Thanks.

touchtank Says:
April 24, 2011 at 4:39 pm
If there is a distinguishing class (or id) you could select the tds directly:

NSArray* tds = [document selectElements: @"td.some_class"];

If not, you can use the ‘+’ syntax:

NSArray* tds = [document selectElements: @"table tbody tr+tr td+td"];

Note the ‘+’ means immediately following…

Note: the table and tbody may be superfluous.

Hope this helps

Antony Morozov Says:
July 12, 2011 at 8:25 am
Simply method:

char *_td[5];
NSString* tds(), td;

for (int i; i<5; i++) {
_td[i] = [[document selectElement: tds()] contentsText];
}

NSString* tds()
{
td = [td stringByAppendingString:@"td +"];
return td ;
}

zobertke Says:
April 28, 2011 at 2:10 pm

Can I use the ElementParser as a full text parser for bunch of htm files? I need to write an app which is able to find and display the list of all the html files (diary pages) in which a searched string is present. We are talking here about several thousands of html files. Searching time is critical.I was wondering if I can skip a solution with indexed search.
Thanks for your answer

touchtank Says:
April 28, 2011 at 5:18 pm
Probably not a great fit.

Issam Says:
May 5, 2011 at 6:35 am

This worked great for me, however I am having encoding issues as Arabic characters are bei encoded as Unicode escaped \uxxxx

How to get this fixed

Issam Says:
May 5, 2011 at 11:10 am
It ends up the problem with the way strings are being added to a dictionary…

I am doing [dictionary setValue:[content contentsText] forKey:myKey]

[content contentsText] is now being unicode escaped in the dictionary …

This is my problem, nothing to do with elementparser

mark Says:
June 5, 2011 at 2:46 pm

Is there a way to get around this warning i get..

“WARN [index: 257]: document left tag open “

nishant Says:
August 4, 2011 at 6:32 pm

Can i set the value for the particular HTML element ? For example,when i load the web page in my UIWebView which has Login and Password field. I want these fields get filled when i load the page. So for that i need to get the text field element first by parsing the whole page and then i need to set the value for these elements. How can i set the value ? Or is it for the Read Only ?

touchtank Says:
August 4, 2011 at 9:23 pm
sorry, it is read only…

Sho Says:
October 5, 2011 at 8:05 am

Hi,
Thanks for the nice parser. How would I parse, say a php file or a web page on a domain? Cause using the demo, when I put in the source as domain.com/index.php It didnt work.

Any idea?

touchtank Says:
October 5, 2011 at 11:10 am
You can either get the source of the page yourself and then hand it to the parser, or use URLParser to have it fetch it automatically. Not sure where you are going astray…

Mike Zang Says:
December 4, 2012 at 9:30 pm

Can you just update it to use ARC or in IOS6?

Touch Tank

ElementParser

Introduction

77 Responses to “Introduction”

Leave a comment Cancel reply

Download Now

Learn More

Touch Tank

ElementParser

Introduction

Share this:

77 Responses to “Introduction”

Leave a comment Cancel reply

Download Now

Learn More