small medium large xlarge

Scala for the Intrigued

XML as First Class Citizen

by Venkat Subramaniam

Generic image illustrating the article
  Scala handles well-formed XML syntax as regular language syntax. But it goes much farther than that in its support of XML as a first class language citizen.  

Scala is a nice language, even for something as sinister as XML processing. Scala makes dealing with XML easy and fun. Let’s explore the XML-related facilities that Scala offers in this eighth part of the series.

We’ll start by creating a small XML fragment in code.

 val greetings = "<greet>hello</greet>"
 println(greetings) //<greet>hello</greet>

The code prints the XML content, but on closer look you discover we wrapped the XML in a String. That’s familiar: most languages contain XML in strings. No wonder XML behaves strangely. How would we behave if constrained that way?

Scala treats XML with respect. It’s given a first class citizen status. So in Scala you can write the above as:

 val greetings = <greet>hello</greet>
 println(greetings) //<greet>hello</greet>

That’s it, no shackles. Scala handles well-formed XML syntax as regular language syntax. We can initialize and assign variables with raw XML content much as we’d assign integers or double values.

If you’re curious, ask Scala what it thought of the XML value you assigned. It politely replies with the type it inferred for the variable.

 println(greetings.getClass) //class scala.xml.Elem

The Elem class represents an XML element. You can query (or interrogate) this instance for more details about the document.

 println(greetings.label) //greet
 println(greetings.text) //hello

You can even use the pattern matching facility you saw in the previous article in this series.

 greetings match {
  case <greet>{msg}</greet> =>
  println("found " + msg) //found hello

The pattern matching syntax is quite helpful for extracting contents out of XML documents. This built-in capability of the language minimizes the dependency on third party parsers when working with XML in Scala.

In this example, if the greetings object contains an XML document with root element <greet></greet> the match binds the msg variable to the children nodes of this root element.

We can also use XPath query to access the contents of the document. In XPath you generally use a forward slash to get a matching child and double slash to get all matching descendants. But these two conflict with a Scala operator and the syntax used to create comments. Scala instead uses backslash and double backslash, respectively, for these two XPath operators, as in the next example.

 val message = <message priority="urgent">avoid ceremony</message>
 println("Received " + message \ "@priority" +
  " message: " + message.text)
 //Received urgent message: avoid ceremony

To access the value of the priority attribute of the element we used the backslash operator and prefixed the attribute name with an @ symbol. The @ prefix is used to indicate attribute names in XPath queries. Drop that symbol if you’d like to refer to a child element instead of an attribute.

The XML documents we used so far had only one element. Let’s see how to parse a document with more elements. Let’s work with an XML document that has stock prices and tickers:

 val stocksAndPrices =
  <stock symbol="AAPL">302.15</stock>
  <stock symbol="GOOG">606.20</stock>

Let’s extract the symbols and their prices.

 stocksAndPrices match {
  case <stocks>{symbolAndPrices @ _*}</stocks> =>
  println("Found " + symbolAndPrices.size + " children")
  //Found 5 children

The stocks element has more than one child. We want to gather all its children. Instead of a simple {msg} we used before, we use _* to gobble up all the contents between <stocks> and </stocks>. Using symbolAndPrices @ _*, we asked Scala to bind the children nodes of the element to the value symbolAndPrices.

The symbolAndPrices is bound to an ArrayBuffer that holds the children of the element. Scala reported seeing five children; that includes the two stock children plus the text nodes before, after, and in between these two elements.

We don’t care about those empty text elements. Our focus is on getting the ticker symbols and prices. We can loop through this collection and at the same time extract the content.

 stocksAndPrices match {
  case <stocks>{symbolAndPrices @ _*}</stocks> =>
  for(symbolAndPrice @ <stock>{price @ _}</stock>
  <- symbolAndPrices) {
  println(symbolAndPrice \ "@symbol" + " : " + price)
 //AAPL : 302.15
 //GOOG : 606.20

In the for expression we asked Scala to iterate over the elements of the symbolAndPrices ArrayBuffer and only give us the stock elements. We indicate with the {price @ _} that we’d like to match any content between the start and closing tags and bind it to the price value.

Instead of using a match and then the for expression, you can directly use the XPath query to extract all stock children from the document and make the code even more concise.

 stocksAndPrices \\ "stock" foreach { symbolAndPrice =>
  println(symbolAndPrice \ "@symbol" + " : " + symbolAndPrice.text)
 //AAPL : 302.15
 //GOOG : 606.20

A combination of XPath queries, \ and \\, and accessing the text property of Elem took care of the processing.

If you know the structure of the document, then by using a combination of pattern matching and XPath queries you can easily extract the contents of the document. If you want to navigate arbitrary document structure, then use the classes provided in the scala.xml package to which the Elem class belongs.

We’ve now seen how to get contents out of an XML document. Scala also makes it easy to create XML documents. You don’t have to deal with unwieldy print statements to produce the desired output.

Given a list of ticker symbols, let’s see how we can create an XML document of the tickers and their prices. Let’s define a sample list of tickers and a function to go out to Yahoo and fetch the prices for each stock. (You’ve see the code to fetch the price in an earlier article in this series.)

 val tickers = List("AAPL", "AMD", "CSCO", "GOOG", "HPQ",
  "INTC", "MSFT", "ORCL", "QCOM", "XRX")
 case class StockPrice(ticker : String, price : Double) {
  def print = println("Top stock is " + ticker +
  " at price $" + price)
 def getPrice(ticker : String) = {
  val url = "" + ticker
  val data = io.Source.fromURL(url).mkString
  val price = data.split("\n")(1).split(",")(4).toDouble
  StockPrice(ticker, price)

The getPrice function takes a symbol and returns an instance of the class StockPrice with the ticker name and the price obtained from Yahoo webservice.

Let’s see how to create an XML document of this information. In much the same way you can place XML into Scala code, Scala also allows you to intermix Scala code within XML. You must ensure that the code you place within the {} block produces well-formed XML fragments for this to be meaningful.

 def stockXMLFragment(ticker : String) =
  <stock ticker={ticker}>{getPrice(ticker).price}</stock>

The stockXMLFragment function takes a ticker name as parameter and produces an XML element of the format <stock ticker="symbol">price</stock>. The price is obtained with a call to the Yahoo webservice via the getPrice function.

It was easy to place the data value directly into the XML element using the {} syntax. In the case of the price, Scala will place the value as a simple text child of the element. For the ticker name, Scala will automatically wrap the value between double quotes for the attribute.

We now know how to create an XML element fragment for a given ticker symbol. Let’s use this method to create the entire document for all the ticker symbols on hand.

 val stockDocument = <stocks>{tickers map stockXMLFragment}</stocks>

Scala’s conciseness and expressiveness shines yet again here. When you call tickers map stockXMLFragment, Scala passes each ticker symbol in the tickers collection to the stockXMLFragment function and collects or gathers the XML elements. The collected elements become the children of the root element <stocks></stocks> and get bound to the value stockDocument. Let’s print the XML document produced by the above code.

 //<stocks><stock ticker="AAPL">599.34</stock><stock ticker="AMD">
 //8.03</stock>...<stock ticker="QCOM">66.29</stock>
 //<stock ticker="XRX">8.21</stock></stocks>

The output you see when you run the code will depend on the market conditions at the time of the call, but the structure of the XML document is set in our Scala code.

You learned about the XML processing capabilities of Scala in this article. This is a good opportunity to apply Scala on your projects. If you have a Java project that does significant XML processing, you can replace just the parts dealing with XML using the above techniques.

Dr. Venkat Subramaniam is an award-winning author, founder of Agile Developer, Inc., and an adjunct faculty at the University of Houston.

He has trained and mentored thousands of software developers in the US, Canada, Europe, and Asia, and is a regularly invited speaker at several international conferences. Venkat helps his clients effectively apply and succeed with agile practices on their software projects.

Venkat is the author of .NET Gotchas, the coauthor of 2007 Jolt Productivity Award winning Practices of an Agile Developer, the author of Programming Groovy: Dynamic Productivity for the Java Developer and Programming Scala: Tackle Multi-Core Complexity on the Java Virtual Machine. His latest book is Programming Concurrency on the JVM: Mastering Synchronization, STM, and Actors.

Send the author your feedback or discuss the article in the magazine forum.