Example: How To Parse XML Files With Ruby LibXML

14 Dec

I’ve spent the last two days trying to figure out how to parse XML files using Ruby LibXML. I got to the part where the XML file is parsed, but I couldn’t figure out how to properly get to the “node” after getting the XPath Object. For some reason, it kept returning “nil”, even though the XML file was definitely there.

So, after a bunch of frustration, I came up with a simple hack…


In the below example, my goal was to get the Facebook like count for a specific webpage. Here is the XML File I was trying to parse. After getting the XML File to print out in the Terminal, I simply converted the file into a string and then wrote a simple regular expression to find the “like_count”. This is not the “proper” way to do it, but it’s a very simple and easy hack 🙂

#The Facebook Like counter returns the number of 
#likes from the specified webpage

require 'rubygems'
require 'open-uri'
require 'libxml'
require 'net/http'

siteURL = "holler.com"
apiURL = "http://api.facebook.com/restserver.php?method=links.getStats&urls=#{siteURL}"
xml_data = Net::HTTP.get_response(URI.parse(apiURL)).body
doc = LibXML::XML::Parser.string(xml_data).parse #parses the XML data

# here is a hack if you can't figure out where to go from here

content = doc.to_s #convert the XML data into a string
res = content.match(/like_count>([^<]+)</)[1] #a regular expression to find the tag
like_count = res.to_i #convert the string into an integer if necessary
puts "Like count is #{like_count}"

And if you have multiple XML objects on a page, you can do a similar hack:

#The Facebook Like counter returns the number of 
#likes from the specified webpage

require 'rubygems'
require 'open-uri'
require 'libxml'
require 'net/http'

siteURL = "holler.com,natashatherobot.wordpress.com,psychworld.com"
apiURL = "http://api.facebook.com/restserver.php?method=links.getStats&urls=#{siteURL}"
xml_data = Net::HTTP.get_response(URI.parse(apiURL)).body
doc = LibXML::XML::Parser.string(xml_data).parse #parses the XML data

# here is a hack if you can't figure out where to go from here

content = doc.to_s #convert the XML results into a string
content = content.split(/<link_stat>/) #split the string into an array of strings for individual sites
content.delete_at(0) #delete the header of the XML file
    
#get the website and total like count and store as a hash
likeCount = Hash.new 
    
content.each { |site|
 url = site.match(/url>([^<]+)</)[1]
 like_count = site.match(/like_count>([^<]+)</)[1]
 likeCount.store(url, like_count)
}

return likeCount #returns a hash with a url and its like count. 

If you can get the LibXML parser to work “properly” for the above code, please let me know in the comments!

Advertisements

One Response to “Example: How To Parse XML Files With Ruby LibXML”

  1. Shravani October 9, 2012 at 2:41 am #

    Well I hit the same sort of issue as you faced.The issue is with the default namespace xmlns=”http://api.facebook.com/1.0/” in your input xml. Libxml would require you to register this default namespace before running any XPath queries on the XML Document. Take a look at this, this fixed my parsing errors:

    http://badpopcorn.com/blog/2007/08/25/libxml-default-namespace/

    doc = XML:Document.file(‘http://badpopcorn.com/file.xml’)
    dn = ‘dn:http://www.badpopcorn.com/solutions
    doc.find(‘//dn:solutions’,dn).each do |node|
    r = node.find(“dn:solution”,dn).first.content
    end

    Notice the usage of ‘dn’ in every query on the document.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s