Hi, we’re back with some visual analysis of beer ratings gathered from the RateBeer API. This time around, we’re shifting our Michigan-centric focus from Bell’s to Founders to see what the user ratings tell us about Founders beers. Once again we’ll employ the powerful Exploratory platform to do our analysis.
Let’s begin by using the GraphiQL query language to gather the data from the RateBeer API. Here’s a look at our query criteria, with an explanation of what we’re doing here:
Now that we have the raw .json data, the next step is a simple copy and paste to a text editor. We’ll use Brackets, and save our data as a .json file.
With the data in hand, we’ll now turn to Exploratory for our visualization and analysis work. As we noted in previous posts, Exploratory harnesses the power of R, making it easy to use the vast array of R packages to perform virtually any type of analysis. Using the import .json option, we point Exploratory to our data file, and see the results:
We now have the data needed to perform our analysis, so let’s begin. My first step is to filter the data based on the ratings count. Many of the 463 beers have 0 ratings, or a very small number. In order to perform more robust analysis, I chose to set a filter of >= 20 ratings. This drops our beer count to a more manageable 90, which should make our visual displays easier to interpret.
The next step is to create a simple bar chart showing the ratings counts in descending order. This will give us a quick feel for which Founders beers are reviewed most often. Let’s add one more element before we produce this chart – an ABV categorization. Exploratory gives us the option to create bins, with several binning options. With an even 90 beers, we’ll choose the Equal Frequency option, which gives us 18 beers categorized from high to low by alcohol by volume:
Now we can use our bins to color the bars in a bar chart, and see if users are more likely to review higher ABV beers. Here’s our result, using an additional filter to show only beers with 100 or more reviews (so we can actually see all the beer names). This is a great feature in Exploratory – we can filter directly on this chart, while still maintaining our base of 90 beers for our ongoing analysis.
Contrary to what we may have thought, we don’t have a lot of darker bars (higher ABV) among the most frequently rated beers. This is due to the presence of some of Founders most popular beers (All Day IPA, Centennial IPA, Breakfast Stout) in the relatively lower ABV range.
Now let’s move on to seeing the relationship between ratings and ABV. Recall the strong correlation we witnessed in our Bell’s analysis – about .73. What about for Founders? Lets have a glimpse using a scatter plot:
Once again we see an unmistakable pattern, with the higher ABV beers showing generally higher ratings. Evidently, users find higher ABV beers to be more complex and worthy of higher scores. In this case, our correlation is a more modest 0.60, still a very significant number.
The folks at RateBeer have a way of adjusting for this user bias toward high ABV beers – they produce a style score, where beers are rated within their specific style category. In other words, stouts versus other stouts, IPAs versus other IPAs, and so on. What happens when we swap out the user ratings and substitute the style scores?
Well, this certainly changed things! Our correlation is now a very modest 0.27; higher ABV levels are still associated with higher style scores, but it is now a rather weak relationship. Lower ABV beers like All Day IPA, Centennial IPA, and Breakfast Stout are all rated at or above 95 points – outstanding relative to their competition.
Let’s now undertake some statistical analysis, starting with some clustering, using the K-means method. This approach will group the beers by similar (or dissimilar) characteristics, based on our choice of input variables. In this case, we’ll use ABV, Average Rating, and Style Score. Exploratory will then produce several plots which will help us understand which beers are grouped together. First up is our scatter plot, based on using 5 clusters:
We can see some relatively tight groupings, with minimal overlap (a good thing), although our cluster to the far left is quite spread out. Let’s learn more by viewing the box plot selection:
What does this plot tell us? First off, we need to understand that the data is normalized across categories so we can do comparisons. Think of 0 as the baseline for this plot. Using this approach, we can deduce the following about each cluster:
|Cluster||ABV||Avg Rating||Style Score|
So we have 5 distinct clusters tending toward different patterns across our 3 input variables. This information will become more granular when we view the stacked chart option in our next step.
Now we see some delineation, especially for Cluster 3, which has just 3 styles represented – wheat ales, lagers, and fruit beers. In other words, beers at the lower ABV end of the spectrum, just as the table above stated. Let’s use a table to show the general patterns:
|Cluster||Representative Beer Types|
|1||imperial stouts, double IPAs, barleywines|
|2||porters, scotch ales, strong ales|
|3||wheat ales, lagers, fruit beers|
|4||IPA, Double IPA|
|5||session IPAs, brown ales, pale ales|
There is certainly crossover among beer types, due to the variance in ABV, style scores, and user ratings across types, but this gives us some insight into each cluster. Exploratory provides additional statistical methods and charts that could help us dig even deeper into this data, but we’ll perhaps explore those in an upcoming post.
Well that’s it for now. There’s a good chance I’ll follow up with a deeper dive into text analysis of Founders user reviews in a future post, but until then, be well and enjoy the holiday season. Thanks for reading!