Now it’s time for the fun part of this series – taking our data into Exploratory and finding out about the Top 500 beers according to RateBeer users. What patterns will we see? Which styles will dominate the ratings? Will high ABV styles rule the day? We’ll be able to answer these and other questions using the R-based power of Exploratory.
To begin, let’s run some simple charts on our .json data file we previously imported. These will give us some basic insights into the user ratings. Let’s start with a look at which styles comprise the Top 500 beers:
Wow! 252 of the 500 come from a single style, Imperial Stout. These high octane brews are clearly the dominant style in the Top 500, with Double IPAs a distant second.
What about country of origin? One might suspect that the US will be dominant, given the popularity of craft beer in the states, but let’s verify that assumption.
Sure enough, the US accounts for more than 70% of the Top 500 beers. Belgium is a clear second, with Denmark, Canada, and Sweden vying for third place. A couple of interesting omissions can be noted – both Germany and the Czech Republic are missing, even though both have glorious brewing histories. However, they both tend to brew lower alcohol styles – pilsners, weissbiers, and lagers, among others are not well represented in the Top 500 rankings, regardless of origin.
Let’s move on to viewing the data using boxplots, a chart type that is very effective at showing distributions; in this case, we wish to plot average user ratings. First, by style:
This chart tells an interesting story. While we previously noted the overall dominance of the Imperial Stout style, this chart shows us that some other categories have higher median scores. Of course, this makes sense given how many Imperial Stouts are represented; their average score is likely to be around the average for all 500 beers. The two styles that jump out in this chart are the Quadrupel/Abt, and the Mead categories. Each has a very high median score – about 4.25 for the Quadrupel, and 4.19 for the Mead. This tells us that while there may be only a small number of each in the 500 beers, the ones that are rated will tend to be in the upper region of the ratings. More to come on this, although not in this post.
What about the country ratings viewed through a boxplot chart?
Here we have a similar pattern to what we just saw with the styles. Since the US accounts for 70% of all the beers, the median score will reflect the overall average ratings. We do see a very high bar for the United States, telling us that there are one or more beers at the very top of the user ratings. The one data point that stands out to me here is Sweden – the median line (inside the box) is higher than any other country. This would indicate that the Swedish beers in the Top 500 are likely clustered toward the top of the rankings.
Alright, now that we have a basic understanding of summary data, it’s time to start going into the details a bit deeper. Since we’ve already seen the dominance of the Imperial Stout style, let’s begin there. We’re going to use Exploratory’s bubble chart option to view the data at a more granular level. First, however, we need to reduce the number of beers from the original 252. So first we’ll specify the beer style, and then run a filter based on the number of user ratings:
Now that we have only Imperial Stouts, let’s reduce our set by looking for beers with 250 or more ratings:
This will give us a manageable data set to view using bubble charts. Let’s have a first look, with beers on the x-axis and average ratings on the y-axis:
We’ve also sized the bubbles by the number of user ratings, which could serve as an indicator for availability, popularity, or simply how long a beer has been produced. Some examples with larger bubbles include Alesmith Speedway Stout, Founders CBS, and Goose Island Bourbon County Stout, all perennial favorites with strong reputations.
Exploratory enables us to do some fun things within the chart – we can hover over a bubble to see summary details, and even click to see further details from the data set. Here’s an example:
Let’s shift gears a bit, and do some similar views for IPAs. For this example, we are combining multiple IPA categories, easily done by updating our beer style filter:
Applying this filter will update our scatter plot to reflect IPA beers and their ratings. One useful note – in Exploratory you can easily duplicate charts, so you don’t have to overwrite the Imperial Stout view. Just click on the chart tab, select duplicate, and then make your modifications. It’s simple, and a great way to create identical views based on the same filters. Here’s our IPA result (after some trimming and zooming). In this case, we were curious about the top-rated brew, and learned that it is Pliny the Younger, from Russian River Brewing in California.
There are some Michigan beers here as well, so with a nod to my home state, here’s a popular seasonal release from Bell’s Brewing:
Two final charts for you; first, we’re finally going to have a look at the top-rated beers, regardless of style. We will, however, use style to color each rating bar, adding one more layer of insight to the overall analysis. This will be followed by the two more charts, with a higher threshold for the number of user ratings. Drumroll, please… and the top-rated beer is…
It’s Toppling Goliath Kentucky Brunch for the win, with an average rating of 4.53! Notice the dominance once again of Imperial Stouts (the brown bars), with 12 of the top 25 brews, a rate consistent with their 50% representation in the Top 500. Meads are heavily represented here, with seven entries, followed by a mix of Quadrupels, Double IPAs, Ciders, and Saisons.
One note, if we want to refine things a bit. Recall our earlier look at the Imperial Stout category, where we had a threshold of 250 ratings, so we could make our chart more manageable. Where was Toppling Goliath Kentucky Brunch in that view? It was missing; turns out we have only 144 ratings for this beer. So what happens if we once again apply that filter? To make this simple, we have created a new field that simply splits our beers into 250+ ratings, and < 250 ratings. How does the chart look when we color by this new criteria?
Now, as expected, Toppling Goliath Kentucky Brunch is colored blue, as part of the group with < 250 user ratings. Each of the orange bars in the chart exceed the threshold. Let’s refine this by also filtering on our new criteria:
And the new winner is Närke Kaggen Stormaktsporter from Sweden! We now have perhaps a better view for beers that are either more widely available (hence, more user reviews) or have been around longer, thus generating more reviews.
Well I hope you had fun reading this; I certainly had fun creating it. There is so much more that can be done with this data that it’s almost daunting, but I’ll persevere as time permits. The next entry in the series will take a more geospatial view of the data, as in where are the breweries located who are producing these Top 500 beers. As always, thanks for reading!