rexmlの使い方例メモ

rexmlは、rubyでxml解析を行うためのライブラリで、
rubyに標準添付されています。

使い方の簡単なメモ(自己流なので、使い方が間違っているかもですが。。。)

こんなxml文書(tmp.xml)あったとする。

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xml:lang="ja">
  <channel>
    <item>
      <title>ラッキー☆ちゃんねる </title>
      <link>http://www.lucky-ch.com/weekly/070827.html</link>
      <description>かがみんは俺の嫁</description>
      <pubDate>Sun, 17 Feb 2008 11:07:22 +0900</pubDate>
      <author>utadaq</author>
      <dc:subject>lucky-star</dc:subject>
      <dc:subject>kagamin</dc:subject>
      <dc:subject>京アニ</dc:subject>
      <dc:subject>next</dc:subject>
    </item>
    <item>
      <title>viプラグイン - EclipseWiki</title>
      <link>http://eclipsewiki.net/eclipse/index.php?vi%A5%D7%A5%E9%A5%B0%A5%A4%A5%F3</link>
      <description></description>
      <pubDate>Sun, 13 Jan 2008 01:00:34 +0900</pubDate>
      <author>utadaq</author>
      <dc:subject>vi</dc:subject>
      <dc:subject>eclipse</dc:subject>
    </item>
  </channel>
</rss>

以下のruby スクリプトで解析する。

#! /bin/ruby
$KCODE = 'u'

require 'rexml/document'


rss = open('tmp.xml').read
doc = REXML::Document.new(rss)
elem = REXML::XPath.match(doc,'/rss/channel//item')

elem.each_with_index do |e,i|
  p "------------- #{i} -------------"
  e.each_element('title') do |title|
    p "title:#{title.text}"
  end
  e.each_element('dc:subject') do |tag|
    p "tag  :#{tag.text}"
  end
end

実行結果

"------------- 0 -------------"
"title:ラッキー☆ちゃんねる "
"tag  :lucky-star"
"tag  :kagamin"
"tag  :京アニ"
"tag  :next"
"------------- 1 -------------"
"title:viプラグイン - EclipseWiki"
"tag  :vi"
"tag  :eclipse"

XPathの指定方法については、以下のサイトを参考にさせていただきました。
http://www.nextindex.net/java/XML/XPath.html
(プログラムはJavaですが、XPathがどういうものなのかの説明がわかりやすいです）