Pretty image
The author of the definitive reference to the Ruby language leads a tour of some of the best features of the latest release.

At the risk of inflaming Apple’s lawyers, Ruby 1.9.2 is the best Ruby ever. It’s fast—things run noticably quicker than 1.8. It has some killer new features (we’ll briefly look at the new regular expression engine, some new classes and methods, and the new encoding support in this article). And, despite growing fairly significantly, it still has that playful feeling to it—programming Ruby is still fun.

Early versions of 1.9 had problems with third-party libraries. Although the language is basically the same, there were some incompatibilities (mostly with indexing into strings and block variable scoping, along with some low-level changes for C extension writers). However, those problems are now in the past. I’ve been using Ruby 1.9 for a year now, and everything I’ve needed has just worked.

So, if you’re still using Ruby 1.8, let’s see if I can convince you to upgrade with a few tasty nibbles of what’s new.

Not-So Regular Expressions

Ruby 1.9 uses a brand new regular expression engine, giving it unparalleled regular expression support. Let’s look at just one feature—named matches.

The pattern /(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)/ matches a time. It contains three groups in parentheses. Unlike Ruby 1.8, though, I can give each of these groups a name (I used hour, min, and sec). If my pattern matches a string, I can then refer to those named groups.

 pattern = /(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)/
 string = "It is 12:34:56 precisely"
 if match = pattern.match(string)
  puts "Hour = #{match[:hour]}"
  puts "Min = #{match[:min]}"
  puts "Sec = #{match[:sec]}"
 end

produces:

 Hour = 12
 Min = 34
 Sec = 56

There’s even a shortcut (although it’s a little tacky). If the regexp is a literal, it will set local variables to the contents of named groups:

 string = "It is 12:34:56 precisely"
 if /(?<hour>\d\d):(?<min>\d\d):(?<sec>\d\d)/ =~ string
  puts "Hour = #{hour}"
  puts "Min = #{min}"
  puts "Sec = #{sec}"
 end

You can use named groups a little like subroutines—you can invoke them many times inside a pattern. You can even call them recursively: the following pattern matches text between braces, handling nested braces correctly (so it will match "{cat}" and "{dog {brown}}").

 pattern = /(?<brace>
  { # literal brace character
  ( [^{}] # any non-brace
  | \g<brace> # or a nested brace expression
  )* # any number of times
  } # until a closing brace
  )/x
 
 puts $1 if pattern =~ "I need a {cat}, quick"
 puts $1 if pattern =~ "and a {dog {brown}}"

produces:

 {cat}
 {dog {brown}}

The \g<brace> part of the expression means invoke the part of the pattern named brace at this point. Because this happens recursively inside the group, it handles arbitrarily nested braces. (Note also the use of the x option, which lets me lay out the regular expression neatly and include comments.)

Related to \g, the \k construct means match what was previously matched by a named group. It is similar to the existing \1, \2,… feature in Ruby 1.8. But, unlike the old backreferences, \k knows about multiple matches made by the same group, and you can use this knowledge to say match the text that was matched by this group at a certain nesting level. Here’s a regexp that matches palindromes—words written the same forwards as backwards:

 palindrome_matcher = /
 \A
  (?<palindrome> # palindrome is:
  # nothing, or
  | \w # a single character, or
  | (?: # x <palindrome> x
  (?<some_letter>\w) # ^ ^
  \g<palindrome> # | ^^^^^^^^^^ |
  \k<some_letter+0> # '--------------'
  )
  )
 \z
 /x

The new regexp engine also offers better control over backtracking, positive and negative look-behind, and full internationalization support. On its own, it’s a reason to switch to Ruby 1.9.

New Classes and Methods

Ruby 1.9.2 has almost 400 more built-in methods and 13 more built-in classes and modules than Ruby 1.8.7. That’s a boatload of extra goodness, baked right in. Many of the new methods were introduced when the Rational and Complex classes were merged into the core, but even allowing for those, there are literally hundreds of new methods to explore. Let’s look at a few here—all these are somehow related to collections and enumeration.

Let’s say you need to deal a deck of cards to four people. Each card is represented as a suit (C, D, H, and S) and a rank (2 to 10, J, Q, K, and A). We can use some of Ruby’s spiffy new methods to do this easily.

 suits = %w{ C D H S }
 ranks = [ *'2'..'10', *%w{ J Q K A } ]
 deck = suits
  .product(ranks)
  .map(&:join)
 
 hands = deck
  .shuffle
  .each_slice(13)
 
 puts hands.first.join(", ")

produces:

 C9, H3, H9, S6, D10, SQ, CJ, D4, H4, C8, C4, HQ, S9

Pretty much every line in this little example has some 1.9 goodness. First, we can now use splats (*) inside array literals. Next, notice that we can now put the period that appears between an object and the message we send it at the start of the line (in 1.8, it had to be at the end of the line or Ruby would think the statement was terminated). It’s a small thing, but it makes it easier to chain calls together, as the last line in the chain is no longer a special case.

The product method returns all the combinations of its receiver and its parameter(s). In this case, it combines the suits and ranks into an array of 52 subarrays. The next line uses the fact that Symbol.to_proc is now built into Ruby—it joins the contents of each of these subarrays, converting ["C", "2"] into "C2", the two of clubs.

We use Array’s new shuffle method to shuffle the deck, and then split it into four sets of 13 cards. There’s something subtle here—normally each_slice would take a block, passing each chunk to it as a parameter. Because we didn’t provide one, it instead returned a new 1.9 Enumerator object.

An Enumerator is basically an enumeration wrapped into an object. Enumerators make it easy to take something that used to be expressed in code and instead represent it as an object, allowing it to be passed around and manipulated. Enumerators are typically lazy—they don’t typically do the work of evaluating the thing they wrap until you need it. And the cool thing is, they’re pretty much part of the core of the language—built-in methods such as each that in 1.8 expected a block will now return an Enumerator if you leave the block off. Here’s an example using the new Prime standard library. The each method will normally call its block forever, passing in successive prime numbers. Here, we don’t give it a block, so it returns an Enumerator object. We then call take_while on that object to print out the primes less than 20.

 require 'prime'
 puts Prime
  .each
  .take_while {|p| p < 20}
  .join(" ")

produces:

 2 3 5 7 11 13 17 19

Enumerators don’t sound like much, but over time you’ll find they subtly change the way you program; I now can’t live without them. And this just barely scratches the surface of 1.9’s new functionality.

Multinationalization

Ruby is a citizen of the world, and the world speaks many languages and uses many different sets of characters doing it. Older Rubies ignores this—to them, strings were just sequences of 8-bit bytes. Ruby 1.9 changes this—saying that Ruby 1.9 is encoding aware is a bit like saying that Google has some servers. Many languages say they support international character sets because they have Unicode support. Well, so does Ruby. But Ruby also has support for 94 other encodings, from plain old ASCII, through SJIS, to KOI8, to old favorites like 8859-1.

What does it mean to support these encodings? Well, first it means that strings, regular expressions, and symbols are suddenly a lot smarter. Rather than being sequences of 8-bit bytes, they’re now sequences of characters. For example, in UTF-8, the string ?og has three characters, but is represented as five bytes internally. If we run the following program with Ruby 1.8:

 str = "?og"
 puts str.length
 puts str[0]
 puts str.reverse

We see the following:

 5
 226
 go???

Notice that the length is the number of bytes in the string, the first character is returned as an integer, and the reversed string is mangled. But run it with Ruby 1.9, and you see something very different:

 3
 ?
 go?

But, to make this work, I had to do one extra thing. Remember that Ruby supports almost 100 encodings. How did it know what encoding I’d used for the source code of this program? I had to tell it. In 1.9, every source file in your program can potentially have its own encoding. If you use anything other than 7-bit ASCII in a file, you have to tell Ruby that file’s encoding using a comment on the first line of the file (or the second line if the first line is a shebang). The actual program I ran looked like this:

 # encoding: utf-8
 str = "?og"
 puts str.length
 puts str[0]
 puts str.reverse

This per-file encoding is very cool—it means that you can knit together code written using different encodings by people working all over the world, and Ruby will just do the right thing. To my knowledge, that’s unique among programming languages.

But encoding support doesn’t just stop with program source code. When you open a file or other external data source, you can tell Ruby the encoding to use. All data read from that source will be tagged with that encoding. The same applies to data you write out. Behind the scenes, Ruby works hard to make sure that when you work with this data, you’re doing things that make sense—it’ll raise an exception, for instance, if you try to match a SJIS string using a UTF-8 regular expression.

All this great support means that Ruby is incredibly well suited for writing true international applications.

Prime Time

Ruby 1.9.2 is more than just another point release—it’s the next Ruby in a chain that started over 15 years ago. It’s stable and production-ready. It’s fast and it’s huge. It’s comfortably familiar and it has lots of challenging new features.

But, most of all, it’s still fun.

Dave Thomas has been writing Ruby all his life, and was relieved in 1999 to find a language that let him do it for real.

Send the author your feedback or discuss the article in the magazine forum.