My previous post looked at how to take data generated by the RateBeer API, and ingest, process, and visualize it using Exploratory, a powerful R-based analytics and data science framework. In this post, I’m going to take a look at data from Bell’s Brewery, one of the two largest in Michigan (Founders is the other), and one of the top microbreweries in the US.
Since Bell’s has been around for more than 30 years, it has produced a lot of different beers – 362 in total, according to the RateBeer data. Thiis gives me a lot of data to work with, but we’ll want to do some filtering in order to visualize the data in a meaningful way. As in the last post, my starting point is a .json file imported into Exploratory.
We’ll start with a bubble plot, as this is an effective way to see general patterns in the data. Once again, we’ll place ABV on the y-axis, and average user ratings on the x-axis:
As you can see, there’s some noise in the data, particularly from a handful of beers with a 0 rating. We’ll want to filter those beers out, but even then there will be about 350 beers, many of them with very few user ratings. We would love to be able to see if there’s a correlation between ABV and ratings for Bell’s beers; to do this, we’ll need to filter the data. Fortunately, Exploratory makes this process very simple.
We simply use the percent_rank function as follows:
This will create a new column (Pct Rank) we can use to reduce our data. We can then elect to view the top 10%, 20%, or any other criteria to manage our chart display. Let’s set our filter to grab the top 10%, as shown below:
This will reduce our data set to the 37 most frequently rated beers from Bell’s. In this case, they will each have more than 300 ratings, with many beers well over 1,000 individual ratings. This will yield a robust data set for further analysis.
Remember the chart we created a moment ago? We can now use it again, this time on top of our newly reduced data set. Exploratory allows users to simply move the chart to reflect the changes made in our filtering steps:
Now the chart displays our reduced set of 37 beers:
Wow – certainly looks like a strong correlation between ABV and ratings for the most popular Bell’s beers! We can test that assumption using the Analytics tab, and running a correlation matrix:
Our correlation value of 0.73 gives us an R-squared value of 0.53 (0.73 squared), an indication that the ABV of an individual Bell’s beer explains 53% of the average rating. This is a rather strong indicator! Our finding is that ABV levels are indeed a strong predictor of average ratings for Bell’s most popular beers.
I hope you found this interesting, and thanks again for reading.