ElementParser is lightweight framework to provide easy access to xml and html content. Rather than get lost in the complexities of the HTML and XML specifications, it aspires to not obscure their essential simplicity. It doesn’t do everything, it aspires to do “just enough”.
Let’s begin with some examples.
document = [Element parseHTML: source];
Document is a special element that holds the top level element(s) (e.g. <html> or <rss>) of your document. You now have a tree of Element objects which you can walk using methods like firstChild, nextSybling and parent. You can also access the data each contains with methods like tagName, attributes, contentsText and contentsOfChildren. Nice start. And sometimes this is enough. But let’s say you don’t want to walk the tree to find the data you need. How about:
linkElement = [element selectElement: @"div.nextLink a"];
Here we’re using a css-type selector to locate and return a matching element. Nice. Now we can parse a document and conveniently find elements of interest. (Yes, there is a corresponding selectElements: method that returns all matches.)
Next, let’s bind together your world of objects and the world of elements more closely. To do this, we’ll use the ElementParser directly to register callbacks into your code when an element is found (and its contents parsed).
ElementParser* parser = [[ElementParser alloc] initWithCallbacksDelegate: self];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"];
documentRoot = [parser parseXML: source];
Your code could then look like this:
-(FeedItem*) gotFeedElement:(Element*)element{
FeedItem* feedItem = [[[FeedItem alloc] init] autorelease];
feedItem.title = [[element selectElement: @"title"] contentsText];
feedItem.description = [[element selectElement: @"description"] contentsText];
feedItem.enclosure = [[element selectElement: @"enclosure"] contentsText];
element.setDomainObject = feedItem; //optional
}
Finally, all these html and xml documents often reside on the web. Wouldn’t it be nice if we could use the pattern above to process the documents incrementally as soon as they appear?
How about:
URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self];
[parser performSelector:@selector(gotChanElement:) forElementsMatching: @"channel"];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"];
[parser parseURL: myURL];
There is alot more available under the covers but this may be all you need. Hopefully its just enough. We’d love your feedback at feedback@touchtankapps.com.
Terms of Use
The ElementParser framework (and its source code) is free of charge for non commercial uses (via a GPL license). For other commercial uses, the license fee is $100 per product. (That’s a couple of hours of your time, right?) Support plans are also available. Please contact sales@touchtankapps.com.
August 3, 2009 at 11:50 pm
[…] If it is XML, then there's the built-in NSXMLParser that you can use. If it is HTML, well, then you'll probably need to figure out how to parse it yourself. I saw some people recommend this but I don't know how good it is. Element Parser Touch Tank […]
August 17, 2009 at 5:50 am
I downloaded the ElementParser xcode project but the icone build and go did’nt work and the iphone simulator did’nt appear ?
how to use it?
Is there an exemple that helps me to see how it parse the html content?
August 20, 2009 at 8:50 am
It is not intended to be a stand alone project. Did you see the docs with usage examples?
August 23, 2009 at 7:49 pm
No i did’nt see them .
Can you please send me an exemple to know really how to use it
August 24, 2009 at 11:17 am
Take a look at:
https://touchtank.wordpress.com/element-parser/
Let me know if you still have questions…
August 25, 2009 at 4:10 am
Sorry i’m a beginner in xcode so it looks hard for me to understand .I have a string str with html content and i want to parse this content by only keeping a paragraph between et so how to proceed ?
I hope i did’nt bother you 😉
August 25, 2009 at 3:31 pm
Its not clear exactly what you are trying to accomplish. As an example, to remove the text content of the first ‘p’ tag in an html document whose source is in ‘sourceStr’ you could do the following:
DocumentRoot* doc = [Element parseHTML: sourceStr];
Element* firstParaElement = [doc selectElement: @"p"];
return [firstParaElement contentsText];
August 26, 2009 at 5:36 am
Can we display this text content in a webview and how to proceed if it is possible?
Thanks a lot for your help
September 6, 2009 at 4:34 am
How do I reset my password?
Thanks
Musca Law
Musca Law
September 9, 2009 at 10:32 am
You don’t add it as a framework… you just add the source files… framework support on the iphone is still tricky…
September 9, 2009 at 10:33 am
For SourceForge? I am not sure… anybody else know?
September 8, 2009 at 10:31 am
New to xcode, what is the proper settings to build the ElementParser for iPhone 3.0 (i.e. do we build under the 3.0 sdk)?
How do you link it into your iPhone project. I added as a frame work but get a warning “libElementParser.a file is not of required architecture”, I built for iPhone 3.0 – release and added the file to my iPhone test project as a framework.
September 27, 2009 at 10:58 pm
Howdy,
I’ve been trying out ElementParser for an iPhone project, and it seems to be exactly what I need. One issue, though, I seem to have uncovered a bug with entity parsing. Check out the files here to see what I mean: http://github.com/mferrier/elementparserbug
If you scroll to near the end of the parser output (http://github.com/mferrier/elementparserbug/blob/master/elementparserbug-output.txt) you can see that the last chunk processed is “03 — sugar “, which appears right before
” & spice;” in the html source (http://github.com/mferrier/elementparserbug/blob/master/elementparserbug.html). I’m thinking maybe it’s mis-processing the ampersand and semicolon as an entitiy or something?
Anyway, if I can get these bugs ironed out, I’d definitely be interested in licensing this library for this upcoming iPhone app.
Send me an email or reply here if you’d like any further information.
Thanks!
Mike
October 4, 2009 at 10:40 am
Fix is now posted.
October 1, 2009 at 7:50 am
Hi,
Thanks for this. It seems that selectElements returns an array of Elements that do not have the textContents property set to hold the actual contents of the Element. Is this by design? If so, how do I get the content of an element returned by selectElements?
Thanks,
Reza
October 1, 2009 at 8:45 am
sorry, maybe the above is untrue… it had no contents but had attributes since it was an img tag.
Thanks!
October 4, 2009 at 3:20 pm
Hello,
I’m trying to use ElementParser, but i have some doubts about all the functionality.
Let’s say i parse an HTML string, and select a element. I want to delete this element, and then recreate the HTML file without it, is this possible?
DocumentRoot* doc = [Element parseHTML: sourceStr];
Element* tableElement = [doc selectElement: @”table”];
//remove the tableElement from the doc
NSString *result = // doc to HTML ?
October 5, 2009 at 9:28 am
ElementParser does not implement a full DOM style interface or data model. As a result, you cannot emit a revised version of the document. It is focused on the problem of reading html/xml and easily accessing content in the document.
May 28, 2010 at 8:25 am
Pedro,
ElementParser does not include a serialization routine currently. It is focused on the reading use cases. Sorry.
February 7, 2013 at 1:23 am
I have not tried this, but here’s an idea:
1. Locate the “table” element you want to omit.
2. Get the NSRange of the element.
3. Using the NSString methods, copy&merge the ranges (of the document source code) before and after (but excluding) the element into a new NSString.
Sorry, but I don’t have example code. This is off the top of my head. But I could write something up if needed.
HTH
October 10, 2009 at 7:05 pm
[…] of a good parse I found is Element Parser and its source codes is host at GitHub. However there is no good explanation about how to use it. […]
November 25, 2009 at 3:13 pm
Amazing, I didn’t heard about that up to now. Thx!
January 13, 2010 at 4:55 pm
Is this ok if i add “Classes” folder to my project? Is this enough to work? Thanks.
January 13, 2010 at 4:58 pm
that should be it.
January 19, 2010 at 11:44 am
Hey guys, new to xcode. I need to display content of t HTML table in a part of my iPhone app. However it is just for reference and I dont really need it to parse in a particular way. Is there any way I can just display how the web site is shown straight into my app??
January 20, 2010 at 8:47 am
Take a look at UIWebView for a full-fledged browers view. Or, take a look at TTStyledText from Three20, a third party library.
February 9, 2010 at 6:40 am
I am getting a memory leak every time I use selectElement.
It says tagChunk and NSCFString are leaking 32 and 16 bytes respectfully. Am I doing something wrong or is there a fix for this if its a problem
May 28, 2010 at 8:25 am
Anyone else seeing this?
September 8, 2010 at 5:30 pm
Me too, the problem is linked to TxtChunk* text = [[TxtChunk alloc] initWithString:source range: NSMakeRange(0,0)];
into NSString_HTML @ row 393
If i’m not wrong it happens during the white loop following the allocation of the txtchunk..
September 8, 2010 at 5:42 pm
LEAKS FIXED!
The problem was due to a forgotten release for the property named “lastChunk”.
Add the following line in the dealloc method of ElementParser CLASS:
[lastChunk release];
ie:
-(void)dealloc{
[tagStack release];
[root release];
[lastChunk release];
if (callbackMethods){
CFRelease(callbackMethods);
[callbackMatchers release];
}
[super dealloc];
}
March 3, 2010 at 5:45 am
This looks like exactly what im looking for however i can’t get it to work. Im trying to parse some html from the web and just cant get it to work properly.
Could you post an example of the .h .m file of this working? I’ve literally been searching the web for months to find something like this. Any help would be much appreciated. Thanks.
March 3, 2010 at 8:36 am
I am confused… did you see the code samples at https://touchtank.wordpress.com/element-parser/ ? What errors are you getting?
March 3, 2010 at 10:24 am
I’m just trying to figure out how to use the URLParser: i.e. i’m implementing the below in the viewDidLoad method:
URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self];
[parser performSelector:@selector(gotChanElement:) forElementsMatching: @”channel”];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @”feed”];
[parser parseURL:[NSURL URLWithString:@”http://www.google.ie/”]];
The source or the result of the webpage is not showing up when i run the app.
I’m only coding 6 months so i wouldnt be surprised if im doing something very simple very wrong! Any feedback would be appreciated.
March 3, 2010 at 10:36 am
http://www.google.ie/ is not a feed url… it is a web page. So, it doesn’t have
channel
anditem
tags…March 18, 2010 at 5:26 pm
Hi TouchTank,
First off just wanna say what a great tool youve made in ElementParser. I was up and parsing in less than 5 minutes, its incredibly easy to use, and has all the functionality I need.
I want to buy a license from you, but i get a Mail Delivery error when I email sales@touchtankapps.com
Please Advise.
Thanks
April 7, 2010 at 8:05 am
Thank you very much .
I’d used your great classes very well and help me out.
But I found memory leak when I call the ‘[Element parseHTML: @”testStr”]’ method in a loop of my program.I checked the sources and added the following code to the dealloc method of ElementParser.m class file,and it seems well.
[lastOpened release];
[lastClosedBeforeOpen release];
[lastChunk release];
I’m very sorry for my poor english,didn’t write in english for about 8 years.not sure about I described it clearly.
April 7, 2010 at 11:25 pm
Thank you for your fixes. We will review and include as appropriate.
April 7, 2010 at 12:41 pm
I also tried to get in touch to get a commercial license and could not get an answer.
Is there any chance we could get this open-sourced?
April 7, 2010 at 11:27 pm
Aral, i am sorry that you had trouble obtaining a license. Please contact me at 919-442-8252. Lee
July 9, 2010 at 8:43 am
This seems to be a good solution for a problem I have but I would just like to know if development is continuing. There don’t seem to have been any updates on GitHub since last year. Is there any news?
July 10, 2010 at 1:36 pm
Patrick,
Development has slowed but it is still supported. I’d love to hear about your experiences with it…
July 11, 2010 at 12:02 pm
That’s great. I am just testing out Element Parser, I will let you know how it goes.
Patrick
August 23, 2010 at 6:41 am
How can we use ElementParser to be able to read external CSS files which are being referenced using link tag inside the html file?
August 23, 2010 at 7:46 am
Parvez, the parser doesn’t process such link tags as such. You can ref the HREF attribute yourself to obtain the file but Element Pareser won’t parse the CSS file… Sorry!
September 1, 2010 at 2:08 am
Hi, I am new to this iphone programming can any one help me how to get html contents with tags, now i am getting only contents in an array as string objects, is it possible to get the array contents as a key/value pairs.
September 3, 2010 at 6:46 am
Charlie, this is just what ElementParser does. Take a look at the docs and you shohuod find examples on how to do what you want.
September 3, 2010 at 2:26 pm
Hi ,
I got solution for this , parser will give the element objects array , by this we can takeout whatever we want,
Thanks.
November 6, 2010 at 6:44 pm
This was so great!
I got what I wanted with only two lines of code! Beats all of the Parsers I’ve tried in Objective C!
DocumentRoot *document = [Element parseHTML:html];
Element *element = [document selectElement: @”div#column2-3 div p”];
Thank you so much from Sweden!
November 9, 2010 at 7:56 am
Thanks, glad it worked. Please spread the news.
November 12, 2010 at 9:40 am
@Peter
Can you post the html code your parsed?
November 12, 2010 at 11:00 am
@Stephen
It’s simple if you look at my code. It is a div with id=”column2-3″, inside that div there is another div, and inside the second div is a p-element.
It’s very easy to understand if you read the whole article.
November 17, 2010 at 3:35 pm
Great library! I got a “connection refused” error when i emailed sales@touchtankapps … just left a VM at the # mentioned earlier in this post. Would love to purchase a commercial license!
December 1, 2010 at 5:15 pm
Sorry about the bad sales link. Now fixed.
December 22, 2010 at 7:03 pm
Hi,
I’m new to objective C but I’m using this library to perform some test on the HTML parsing.
I declared a string in this way:
string1 = “the text”;
then:
DocumentRoot* document = [Element parseHTML: string1];
Element* theElement = [document selectElement: @”p”];
The problem is that the “theElement” contentsText is NIL, while the contentsLenght is 8.
ANy ideas??
Thank you
December 22, 2010 at 7:05 pm
Comments accept html code…
the string1 is: @”-html- -div- -p- the text -/div- -/p- -/html”;
I used – insted of .
Hope this is clear. Sorry guys
December 22, 2010 at 7:28 pm
hmmm, as written the tags don’t nest… was that a mistake in the comment or the source?
December 23, 2010 at 8:04 pm
Problem solved, but I have another question for you.
For example, in my html code there are 3 divs, I want to get a collection containing the 3 elements. Can you explain me how should I do.
Thanks
December 24, 2010 at 8:53 am
NSArray* theDivs = [document selectElements: @”div”];
December 22, 2010 at 8:47 pm
Only in my comments.
the string1 is: @”-html- -div- -p- the text -/p- -/div- -/html”;
Thanks
January 13, 2011 at 12:32 pm
Hi,
I have a problem with the special character “ (it’s not the normal brackets “) in the content text of the div class=”news”.
This is my code:
DocumentRoot* document = [Element parseHTML: result];
NSArray* divs = [document selectElements: @”divs.news”];
It works fine until it found the “. If I remove it from the text everything works fine.
Have you the same problem?
January 30, 2011 at 1:04 pm
Hi,
I’m new to objective-c, but I’m trying to use the element parser to obtain the contents of a very simple web page that looks like this:
some data
more data
I only want to get the “some data” and “more data” and display it on my application. Can you help me to get started with this?
Thank you!
January 30, 2011 at 1:24 pm
I’m sorry I don’t know why my previous post did not show correctly.
This is how the HTML contents should look like:
some data
more data
January 30, 2011 at 1:27 pm
ok, I’m not sure why the html tags are not displaying, but basically there is an html tag and then head tag with a meta tag, then comes the data, and at the end the html tag is closed.
I’m sorry for posting the same twice.
January 30, 2011 at 6:27 pm
I am going to assume your html looks something like this:
<html>
<head>
<meta foo='some data'>more data</meta>
</head>
</html>
That being the case. Your code could look something like this:
DocumentRoot* document = [Element parseHTML: htmlString];
Element* meta = [document selectElement: @"meta"];
NSString* fooAttr = [meta attribute: @"foo"];
NSString* contents = [meta contentsText];
I hope this helps.
April 24, 2011 at 12:49 pm
how to select certain row in table such this:
/TABLE/TBODY/TR[2]/TD[2]
Thanks.
April 24, 2011 at 4:39 pm
If there is a distinguishing class (or id) you could select the tds directly:
NSArray* tds = [document selectElements: @"td.some_class"];
If not, you can use the ‘+’ syntax:
NSArray* tds = [document selectElements: @"table tbody tr+tr td+td"];
Note the ‘+’ means immediately following…
Note: the table and tbody may be superfluous.
Hope this helps
July 12, 2011 at 8:25 am
Simply method:
char *_td[5];
NSString* tds(), td;
for (int i; i<5; i++) {
_td[i] = [[document selectElement: tds()] contentsText];
}
NSString* tds()
{
td = [td stringByAppendingString:@"td +"];
return td ;
}
April 28, 2011 at 2:10 pm
Can I use the ElementParser as a full text parser for bunch of htm files? I need to write an app which is able to find and display the list of all the html files (diary pages) in which a searched string is present. We are talking here about several thousands of html files. Searching time is critical.I was wondering if I can skip a solution with indexed search.
Thanks for your answer
April 28, 2011 at 5:18 pm
Probably not a great fit.
May 5, 2011 at 6:35 am
This worked great for me, however I am having encoding issues as Arabic characters are bei encoded as Unicode escaped \uxxxx
How to get this fixed
May 5, 2011 at 11:10 am
It ends up the problem with the way strings are being added to a dictionary…
I am doing [dictionary setValue:[content contentsText] forKey:myKey]
[content contentsText] is now being unicode escaped in the dictionary …
This is my problem, nothing to do with elementparser
June 5, 2011 at 2:46 pm
Is there a way to get around this warning i get..
“WARN [index: 257]: document left tag open “
August 4, 2011 at 6:32 pm
Can i set the value for the particular HTML element ? For example,when i load the web page in my UIWebView which has Login and Password field. I want these fields get filled when i load the page. So for that i need to get the text field element first by parsing the whole page and then i need to set the value for these elements. How can i set the value ? Or is it for the Read Only ?
August 4, 2011 at 9:23 pm
sorry, it is read only…
October 5, 2011 at 8:05 am
Hi,
Thanks for the nice parser. How would I parse, say a php file or a web page on a domain? Cause using the demo, when I put in the source as domain.com/index.php It didnt work.
Any idea?
October 5, 2011 at 11:10 am
You can either get the source of the page yourself and then hand it to the parser, or use URLParser to have it fetch it automatically. Not sure where you are going astray…
December 4, 2012 at 9:30 pm
Can you just update it to use ARC or in IOS6?