An XML package for the S language

Last Release: 3.98-1 (Sun Oct 4 17:00:33 PDT 2015)

The latest version (3.99-0) introduces the ability to define XPath functions for use in the getNodeSet() and xpathApply() R functions. One can use R functions and C routines to implement new XPath functions. Additionally, several XPath 2.0 functions are implemented by default.

Some people have encounterd memory leaks with this package. As far as I am aware, these are only on Windows. I think this is due to the binary versions of the package created for the package missing compiler flag.

This package provides facilities for the S language to

It is an interface to the libxml2 library. It can be combined with the RCurl package for parsing documents that require more involved HTTP requests to fetch the document.

Download

The source for the S package can be downloaded as XML_3.98-1.tar.gz.

There is also a Windows version available from the Omegahat repository. Use

install.packages("XML", repos = "http://www.omegahat.org/R")

Documentation

  • Best practices for using the XML package
    PDF version.
  • A short overview: HTML, PDF
  • A brief introduction to parsing XML in R: HTML, PDF
  • A reasonably detailed overview of the package and what we might use XML for.
  • A manual in and a quick guide to the package (PDF).
  • A short overview of the package.
  • Brief and incomplete Notes on generating XML within S
  • FAQ for the package.
  • Changes to the packages (by release).
  • Examples of Reading Generic XML files

  • XML form of plist (property list) files (e.g. property lists on OS X, old iTunes databases)
    keyValueDB.R
    library(XML)
    source(url("http://www.omegahat.org/RSXML/keyValueDB.R"))
    o = readKeyValueDB("http://www.omegahat.org/RSXML/plist.xml")      
    
  • XML "solr" files that are similar to JSON and name-value pairs with nodes of the form
    <lst name="info">
          <str name="ABC">A string</str>
          <int name="xyz">103</int>
          <long name="big">1000012310303</long>
          <bool>true</bool>
          <date name="lastModified">2011-02-10T11:29:03Z</date>
    </lst>
    
    solrDocs.R
    library(XML)
    source(url("http://www.omegahat.org/RSXML/solrDocs.R"))
    o = readSolrDoc("http://www.omegahat.org/RSXML/solr.xml")      
    

  • Duncan Temple Lang <duncan@wald.ucdavis.edu>
    Last modified: Sun Dec 25 09:52:10 PST 2011