Recent posts have used RateBeer API data to analyze ratings and reviews for Bell’s Two Hearted Ale, one of the benchmarks for the IPA (India Pale Ale) category. Our analyses have been done using Exploratory, a powerful front end tool based on R, an open source analysis framework. We’ll continue this series today, with an illustration of how to perform a simple sentiment analysis using Exploratory. Let’s begin with an overview of the steps we used in our last post:
- Gather the data from the RateBeer API
- Save the data to a .json file format
- Read the data into Exploratory
- Tokenize the text into individual terms
- Filter the text to remove stopwords
- Group the data by the tokens
- Run summarize functions for counts and average scores
- Filter the tokens for the most frequently used terms
- Create charts showing our results
For this analysis, we’re going to focus on Steps 4 & 6; we won’t need to use steps 7-9, as they were focused on aggregating and filtering the data for building charts. In Step 4 we’ll add some additional calculations on the dataset, which will allow us to do our sentiment analysis. The dataset we have at Step 6 is already grouped at the token level; this will allow us to view the average score (rating) for each token (or keyword if you prefer that term), based on our work in Step 4.
Let’s look at our new calculations one at a time. We’ll first create a field that tells us whether a review uses the term “hops”, since hops are an integral component of most IPA-style beers.

Next, we’ll perform a similar calculation for the sentiment analysis field. The “word_to_sentiment” function will categorize words as positive, negative, or neutral (no classification):

We’ll next create two additional fields that can be analyzed to see the impact certain terms have on the average user rating. The first will be any inclusion of a fruit-based term in a review, as these flavors are often associated with IPA flavor perception. Note that this might be fruitless 🙂 if we were analyzing a darker style such as a stout, where we could look for flavors such as chocolate, coffee, molasses, and so on. To create this field, it was important to scan the dataset to see which fruit-based terms appear in multiple reviews. Here’s our calculation:

You can see terms here that define frequently voiced references in user reviews – grapefruit, orange, pineapple, and so on. These terms are then classified in boolean fashion according to whether or not the token is one of the specified terms; in this case, I have used “fruit” and “other”, since they tell me more than a simple yes/no value. Here’s the remainder of the calculation:

This split between “fruit” and “other” will allow us to run analytics on our fruit_flag column, to see whether the mention of fruit flavors is positively correlated with higher average ratings. We have one more custom field to create. This one will look at negative terms per my personal understanding of beer reviews. We’ll follow the same process we just used for fruit flavors, working with an if_else and %in% function to define negative words.

Now we have words such as “disappointed”, or “uninteresting”, that will typically convey a negative feeling about the beer. Here’s the rest of the calculation:

Our two outcomes for this field are “disappointed” and “not disappointed”; we can now run the same type of analysis on this field as we do for the previous three we have created.
Now that our data has been encoded with these new columns, let’s use Exploratory to analyze the data by opening the Analytics tab. Since we are looking for differences between the scores for words in one of two groups, our choice is to run T-tests for each of the four columns, Exploratory makes it very simple to select and run the T-test, which can then be repeated for each column.

In the interest of brevity, I have merged the results into a single Excel file for comparison of the basic stats from our T test analysis. Here are the output values from Exploratory:

We can see the number of rows for each classification, the mean scores, and perhaps most importantly for our purposes, the lower and upper confidence limits (Conf Low & Conf High). If these values overlap across our two groups, then there is no statistically significant difference between the scores of the groups. Let’s have a quick look at each grouping.
The hops_flag mean score is very slightly higher for the FALSE population (~ 3.9485 vs. 3.9228), but there is significant overlap in the confidence intervals between the groups (ie.- the Conf High value for TRUE is higher than the Conf Low for FALSE), so no difference between the two groups.
Next is token_sentiment, our classic negative versus positive word classification. Here we see a larger gap between the meanscores – 3.9972 for positive versus 3.8706 for negative. We can also see that the Conf High value for negative is 3.9083 compared to a Conf Low of 3.9771 for positive words. Success! Our positive terms are associated with higher average scores compared to the negative terms. We’ll explore this in more detail in a moment.
No such luck in the case of the fruit_flag and negative_flag columns. The mention of fruit yields a mean score virtually identical to non-fruit terms, while the not disappointed term has a higher mean score, but too few rows to be statistically significant.
Now that we have identified one successful column, let’s examine further details of the T Test analysis. While Exploratory provides scatter plot and histogram outputs that can provide additional insight into the data, the chart that truly confirms our initial finding is the Error Bar chart. This gives us an instant visual showing the differences between our negative and positive words, as shown:

Here we can see the confidence intervals clearly do not overlap. Negative terms are associated with a mean of ~3.87, with lower and upper bounds of ~3.83 and ~3.91. Positive terms have a mean score of nearly 4.00, and lower and upper values of roughly 3.98 to 4.02. While this gap is not especially large, it is statistically significant. Therefore, we can safely assert that users who use positive terms will rate Bell’s Two Hearted Ale higher (on average) than users with any negative terms in their review.
I hope you found this interesting, and thanks for reading!