Ruby 101 for .NET Developers: .map and .collect 14

Posted by jeff Friday, March 02, 2007 16:16:00 GMT


If you've been using Ruby for at least a week, you've come across the Enumerable mixin class. Classes such as Array and Hash "mix in" Enumerable to provide methods that are common to data containers.

One of the more useful methods is collect. It is sometimes known as map - both methods do exactly the same thing.

I don't think there's a good .NET equivalent to collect, so I'm going to make up an example. I have a list of customers, but I want the just the names of customers in an array. In C# I would write code like this:


// C# Code
public string[] GetNames(Customer[] customers)
{
    List names = new List();

    for (Customer cust in customers)
    {
        names.Add(cust.Name);
    }

    return names.ToArray();
}

Looks like normal C# code, right? I think this is how most of us would solve this problem in C#. In fact, this is a common pattern you'll see all over .NET code: create a container for the results, start a loop, pull out the results you need into the new container, then return the new container.

This is in fact such a common programming pattern, that Ruby idiomizes it for us and makes it a lot easier. The collect iterates over your collection for you, and for each element it finds in the collection, it yields to a block that you provide.

Here's the key: each return value from your block is added to a new collection automatically, and this new collection becomes the return value from the collect call:


def get_names(customers)
  names = customers.collect { |cust| return cust.name }
  return names
end
In actual practice I would not write the extraneous return statements, since they're not necessary in Ruby (the result of the last experession becomes the return value automatically):

def get_names(customers)
  customers.collect { |cust| cust.name }
end
Now what's cool is when you start putting different Ruby pieces together. Suppose I only want the names of customers who have purchased more than 1000 widgets. In C#, I'd just put an if statement in there somewhere:

// C# Code
public string[] GetNames(Customer[] customers)
{
    List names = new List();

    for (Customer cust in customers)
    {
        if (cust.PurchasedQty > 1000)
        {
            names.Add(cust.Name);
        }
    }

    return names;
}

In Ruby, I find it's much better to utilize another Enumerable method, select:


def get_names(customers)
    important = customers.select { |cust| cust.purchased_qty > 1000 }
    return important.collect { |cust| cust.name }
end

or again, simplified:


def get_names(customers)
  customers.select { |cust| cust.purchased_qty > 1000 }.collect { |cust| cust.name }
end

Some people prefer to think of the "collecting" of the names as a way to "map" the elements of the original collection (customers) to a new collection (names). That's why you can use .map instead of .collect, if you want to:


def get_names(customers)
  customers.select { |cust| cust.purchased_qty > 1000 }.map { |cust| cust.name }
end

Any questions?

Comments

Leave a response

  1. Sam Smoot   March 02, 2007 @ 08:51 PM

    It's always great to see a fellow former c# nut see the light. :)

    One tweak to the Ruby example I personally use all the time @Symbol#to_proc@. It's simple coercion just like you'd use in c# for the implicit operator overload during casts.

    class Symbol
      def to_proc
        lambda { |x| x.send(self) }
      end
    end

    The great thing about this is you can use an even shorter notation for @Enumerable#map@. By preceding a method parameter with the & symbol, you're coercing it into a block, so now the @map@ example becomes:

    customers.map &:name

    It took awhile, but now that's the notation I use regularly for returning a list of single attributes from a list of objects.

  2. Steve   March 02, 2007 @ 10:21 PM

    If your using .NET 2.0 and your customers are in a generic List you can use the ConvertAll method.

    List<customer> customers = ...// get customers List<string> names = customers.ConvertAll(delegate(Customer c) { return c.Name; }));

    My syntax might be off a bit since your comment box doesn't appear to have a C# compiler built in :(

    I definitely prefer the Ruby syntax but .NET is getting closer by the day. With C# 3.5 the above will be able to become:

    var names = customers.Select(c => c.Name);

    It's nice to be getting some Ruby'ish like syntax in the .NET world.

  3. Shawn Oster   March 03, 2007 @ 05:59 PM

    Great little example, should help others out there. For the record you can do similiar things in JavaScript if you include the Ruby-inspired prototype library.

    The only thing that irks me is when people talk about "seeing the light" in Ruby. Ruby is great, I love it, but I use C# by day, Ruby at night and honestly I spend just as much time pulling my hair out with issues on both sides of the fence, at least for complicated projects. They just have different issues.

    Another very small point to quibble over is that using a select THEN a collect iterates twice (granted the second time is much smaller) while the boring, loop-and-compare method only loops a single time. Usually this doesn't matter in the least but I have seen a few Rails sites where they had to go back through and rework the way they stacked Enumerable calls over huge collections.

    I do agree with Steve, Microsoft is making some huge improvements in .NET to work with dynamic languages. It'll be nice to have my day and night jobs get closer at least in syntax.

  4. Zaphop   March 05, 2007 @ 07:28 AM

    Why the extra lines in C# and not in Ruby? To make it look more verbose?

    Anyhow, the C# quivalent would be

    public IEnumerable<string> GetNames(Customer[] customers) {

    foreach(Customer c in customers) yield return c.Name;
    

    }

    and to filter:

    public IEnumerable<string> GetNames(Customer[] customers) {

    foreach(Customer c in customers) 
    
        if (c.PurchasedQty>1000) yield return c.Name;
    

    }

    You are showing a naiive use of arrays as enumerators. This is a practice common in other languages, but in the .NET world you should think IEnumerable before locking yourself into an array or a list. Incidently a list (and other collection types) can easily be constructed from something enumerable, as one of the constructor takes an IEnumerable.

    The yield return is actually a quite nice improvement over list monging. Apart from keeping the syntax brief, it also lets you iterate through a collection incrementally producing the elements. Hence, if you "step out" (break the enumeration) the remaining elements will simply not have been generated.

    As mentioned by Steve there's also ConvertAll which match the Ruby collect or map more closely in that it takes a "delegate" (closure; proc whatever) to perform the conversion.

    Ruby is a more compact language. Some of it is because the language itself is more dense, but a big part of it in your samples is due to the cultures as well. In C# it good style to spell the methods and parameters out using full (non-abbreviated) words. It's also good practice not trying to do too much on one line; not so much b/c the line length but because each line should be simple to understand.

    Just because there a fewer and shorter keywords and the variable names are kept shorter does not mean it's smart to cram everything in one line.

  5. Jeff   March 07, 2007 @ 01:12 AM

    Good comments everyone, thanks.

    @Zaphop: I commend you on your use of C# iterators, but keep in mind I'm not trying to convince you that Ruby is better than C#. I'm writing these articles to document how I do things in Ruby that I used to do in C# - no more, no less. But Im must ask, do you seriously believe that most C# programmers know how to use the yield statement in .NET? They don't. In fact a lot of what C# 2.0 brought to the table is largely ignored by most ASP.NET developers out there. And I avoid blogging about techniques that I may have used, but I'm pretty sure 95% of our readers have not. (If I get 50 comments to this post saying that they use the C# yield statement, then believe me, I'll be more than happy to change my approach.)

  6. cubicle67   March 09, 2007 @ 04:45 AM

    I use a slight variation of this:

    customers.map {|cust| cust.purchased_qty > 1000 ? cust.name : nil}.compact

  7. Thibaut Barrère   March 14, 2007 @ 10:01 PM

    Hi!

    If you're under Rails, you can use the following shortcut notation:

    customers.select { |cust| cust.purchased_qty > 1000 }.map { |cust| cust.name }

    becomes

    customers.select { |cust| cust.purchased_qty > 1000 }.map(&:name)

    (This situation is very common - hence the shortcut)

    regards, and keep on posting! Your blog is especially great for all C# programmers.

  8. tobyn   March 15, 2007 @ 02:51 PM

    "Another very small point to quibble over is that using a select THEN a collect iterates twice (granted the second time is much smaller) while the boring, loop-and-compare method only loops a single time. Usually this doesn't matter in the least but I have seen a few Rails sites where they had to go back through and rework the way they stacked Enumerable calls over huge collections."

    You can eliminate the need to loop twice with something like:

    customers.inject([]) { |m,cust| m << cust.name if cust.purchased_qty > 1000; m }

    It's not nearly as pretty, but I'd probably go with something like this for a large result set. It's also about the same length as the original.

  9. Andrey Shchekin   March 29, 2007 @ 06:25 AM

    public IEnumerable<string> GetImportantNames(List<customer> customers) { return from c in customers where c.PurchasedQty > 1000 select c.Name; }

  10. mike kidder   April 17, 2007 @ 04:11 PM

    I would have to admit I am one of those 95% that never realized that "yield" even existed in C#. Believe me, I have read tons of documentation and use .NET Reflector religiously. With that said, thanks to Jeff for the article and to Zaphop for his comments.

  11. here2367@softiesonrails.com

  12. kino   May 24, 2008 @ 01:09 AM

    Multiplicities of an object, consequently, exist for us thanks to experiences.

  13. Ellroy   May 30, 2008 @ 11:16 PM

    Our a priori knowledge excludes the possibility of natural causes.

  14. Brian   June 26, 2008 @ 10:18 PM

    Thanks I was trying to figure this out today!!

Comment


(won't be published)