In Part 1 of this series, I’m going to share how to retrieve the data that will allow us to analyze the Top 500 beers as rated by RateBeer users. As noted in the prior post, this article will focus on the steps where we retrieve and format the data for upcoming analysis. Here are the three steps I’ll detail in this post:
- Develop a query to pull the data using the RateBeer API
- Copy results into a code editor (I use Brackets)
- Merge the results into a single .json file (the API limits each query to 100 results)
Let’s begin by creating and understanding the query we’ll use in the GraphiQL web interface. For those not familiar, the GraphQL language is used to show relationships between items and categories, using a hierarchical, JSON data structure. It is especially useful for tracking social media connections (e.g.- in Facebook), and can be used in our case for showing the relationships between breweries, beers, beer styles, reviews, and more.
Our goal in this case is to retrieve the top 500 beers, as rated by RateBeer users, along with some other relevant data (number of reviews, style, etc.). We’re less concerned about brewery level information, although it would be nice to have a geographic component so we can construct some maps. Let’s have a look at our complete query (after having iterated through some earlier versions):
query{topBeers(first:100) {
items{id
name
ratingCount
realAverage
style {
id
name
}
styleScore
averageRating
brewer {
id
name
country {
id
name
}
areaCode
}
}
}}
What we’re doing above is requesting the top 100 beers (the API limit for a single query), as sorted by the averageRating item. In between, you can see a clause where we grab the style id and name, which will be useful in our upcoming analysis. Another clause provides brewer info, including id, name, and area code, as well as country id and name. Ok, curious about the results? Let’s have a look at the first two entries:
“topBeers”: {
“items”: [
{
“id”: “166019”,
“name”: “Toppling Goliath Kentucky Brunch”,
“ratingCount”: 144,
“realAverage”: 4.574305534362793,
“style”: {
“id”: “24”,
“name”: “Stout – Imperial”
},
“styleScore”: 100.00000002692344,
“averageRating”: 4.532754898071289,
“brewer”: {
“id”: “11242”,
“name”: “Toppling Goliath Brewing Company”,
“country”: {
“id”: “213”,
“name”: “United States”
},
“areaCode”: “563”
}
},
{
“id”: “58057”,
“name”: “Närke Kaggen Stormaktsporter”,
“ratingCount”: 554,
“realAverage”: 4.4922380447387695,
“style”: {
“id”: “24”,
“name”: “Stout – Imperial”
},
“styleScore”: 100.00000002692344,
“averageRating”: 4.481779098510742,
“brewer”: {
“id”: “3682”,
“name”: “Närke Kulturbryggeri”,
“country”: {
“id”: “190”,
“name”: “Sweden”
},
“areaCode”: “19”
}
}
Interestingly, the first two beers are both Imperial Stouts, typically dark, full-bodied, high ABV (alcohol by volume) brews. You can see the number of ratings, the realAverage (an adjusted average score weighted based on user patterns), the style name, styleScore (a score adjusted by a beer’s ratings versus other examples of the same style), the brewer, country, and area code.
Our next step is to copy and paste these results into a code editing program (Brackets in my case), giving us the first 100 beers in a nice .json format. The slight challenge comes in getting the second, third, fourth, and fifth groups of 100 beers, totaling 500 in all. However, this is easily overcome by examining our final beer within each group, and simply referencing that beer’s id value as the starting point for the next query, using the after method:
query{topBeers(first:100 after:491745) {
items{id
name
ratingCount
realAverage
style {
id
name
}
styleScore
averageRating
brewer {
id
name
country {
id
name
}
areaCode
}
}
}}
Note the after value at the top of the query. This tells the GraphiQL query where to pick up the results to fetch beers 101-200. We then repeat this process until we have our full set of 500 beers. The final step is to then merge the five .json files into a single file, being careful to align the various brackets and braces (.json is a stickler for perfect syntax).
We now have our source file with the top 500 beers, which we can import into Exploratory, or any other analysis software that plays nice with .json. The next post in the series will highlight how we can use Exploratory to understand the composition of our top 500. Thanks for reading, and see you soon.