Choropleths in R (yes, "choropleths")

November 12, 2009 in Data,Math

This morning, I was excited to see two of my interests collide as Nathan from FlowingData posted a tutorial for creating a choropleth: a map that uses color to convey values (I didn't know that's what they're called either). He used county-level unemployment statistics to generate the following image:

However, the process appears quite intense, involving some python scripts and mucking around inside an SVG file. I half-heartedly wondered if there wasn't a simpler way to create the image. And just then, along came David from Revolutions to throw down the gauntlet: could anyone come up with a way to replicate Nathan's map in R?

David's post pointed me toward R's maps package, and off I went to start downloading the tools...

It took some time to coerce the BLS data into a compatible form; R don't understand the FIPS county identifiers, so I had to jump through some hoops to get the strings to match (BLS uses state abbreviations; R wants full names. BLS puts in the words "county", "parish" or "borough", R doesn't expect those to be passed. The BLS has a "Miami-Dade" county in Florida; R recognizes only "Dade". Etc.) Ultimately, I used the following code to format the strings:

#load data
stateAbbr rownames(stateAbbr)unemp_data

#get county names in correct format
countyNames counties statesstates

#concatenate states and counties

#parse out county titles & specifics
unemp_data$counties unemp_data$counties unemp_data$countiesunemp_data$counties

#define color buckets
colors = c("#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043")

With the data in the correct format, I aligned a color vector with R's list of counties and plotted the result:


#align data with map definitions

#draw map
map("county",col = colors[unemp_data$colorBuckets[match(mapnames ,unemp_data$counties)]],
    fill = TRUE,resolution = 0,lty = 0,projection = "polyconic")
map("state",col = "white",fill=FALSE,add=TRUE,lty=1,lwd=1,projection="polyconic")

It came out like this:

maps package result

Not too bad, I think. It's a little rough around the edges and a couple of counties are missing - I assume they are the ones with odd naming conventions (you'll notice I manually adjusted Miami-Dade in my code). Also, I'm not sure how to bring Hawaii and Alaska into the picture. Moreover, the image doesn't look too good in R itself. For example, I had given up on getting the county borders to show up as faint lines (I could only get them to be completely opaque) - imagine my surprise when I exported the chart and could see the borders just fine!

In any case, I wasn't satisfied with this result. I've been experimenting with ggplot2 and remembered it had some mapping functions, so off I went to recreate the image with yet another library. Ggplot2 is an excellent general-purpose graphics library; the maps package feels positively last-gen after playing with ggplot2. It's much more extensible and has many more parameters to experiment with - hard to believe it's not the standard graphics package that ships with R (which itself is another last-gen experience).

Anyway, I kept the data formatted as above - which may have added an extra line or two to the ggplot2 code, but makes it simpler to jump back and forth - and used the following script to draw a new version of the map:


#extract reference data
mapcounties <- map_data("county")
 mapstates <- map_data("state")
 #merge data with ggplot county coordinates
 mapcounties$county <- with(mapcounties , paste(region, subregion, sep = ","))
 mergedata <- merge(mapcounties, unemp_data, by.x = "county", by.y = "counties")
 mergedata <- mergedata[order(mergedata$order),]
 #draw map
 map <- ggplot(mergedata, aes(long,lat,group=group)) + geom_polygon(aes(fill=colorBuckets))
 map <- map + scale_fill_brewer(palette="PuRd") +
     coord_map(project="globular") +
     opts(legend.position = "none")
 #add state borders
 map <- map + geom_path(data = mapstates, colour = "white", size = .75)
 #add county borders
 map <- map + geom_path(data = mapcounties, colour = "white", size = .5, alpha = .1)

And the resulting image:

ggplot2 package result

Again, a couple drawbacks: Alaska and Hawaii are nowhere to be seen and the borders are slightly aliased. The aliasing does make a difference, especially when compared to the maps output, but the ease with which I put together the latter graph and the frustration I experienced with the maps package, in my mind, more than erase that perceived shortcoming.

On the whole, I'd still take Nathan's map over these as a finished product. However, I don't think R can be beat for ease of use and all-in-one packageability - if I wanted, I could run regressions on the data, overlay my chart with more colors or new metrics, explode out certain counties or states... the possibilities are endless. With just a couple lines of code, I could overlay states the voted for Obama in blue, or highlight counties starting with the letter "C". The static SVG method doesn't allow any of that flexibility. Also, I'm completely confident that if I had any experience with these mapping packages - rather than using them for the first time tonight - I could mimic Nathan's image perfectly.

The ggplot2 package, in particular, is fantastically powerful. I really wish I had discovered it sooner. As a matter of fact, Josh Reich runs a monthly R meetup for R users in the New York area and the next topic happens to be ggplot2 - it'll be my first time attending, so I can't really say what to expect, but I'm definitely looking forward to it.

Previous post:

Next post: