Skip to content
BrewGraphs
Menu
  • Home
  • Charts
  • Maps by Beer Type
  • Network Graphs
  • About BrewGraphs
Menu

Top 500 Beers – Data Retrieval

Posted on April 7, 2019 by kc2519

In Part 1 of this series, I’m going to share how to retrieve the data that will allow us to analyze the Top 500 beers as rated by RateBeer users. As noted in the prior post, this article will focus on the steps where we retrieve and format the data for upcoming analysis. Here are the three steps I’ll detail in this post:

  1. Develop a query to pull the data using the RateBeer API
  2. Copy results into a code editor (I use Brackets)
  3. Merge the results into a single .json file (the API limits each query to 100 results)

Let’s begin by creating and understanding the query we’ll use in the GraphiQL web interface. For those not familiar, the GraphQL language is used to show relationships between items and categories, using a hierarchical, JSON data structure. It is especially useful for tracking social media connections (e.g.- in Facebook), and can be used in our case for showing the relationships between breweries, beers, beer styles, reviews, and more.

Our goal in this case is to retrieve the top 500 beers, as rated by RateBeer users, along with some other relevant data (number of reviews, style, etc.). We’re less concerned about brewery level information, although it would be nice to have a geographic component so we can construct some maps. Let’s have a look at our complete query (after having iterated through some earlier versions):

query{topBeers(first:100) {
items{id
name
ratingCount
realAverage
style {
id
name
}
styleScore
averageRating
brewer {
id
name
country {
id
name
}
areaCode
}
}
}}

What we’re doing above is requesting the top 100 beers (the API limit for a single query), as sorted by the averageRating item. In between, you can see a clause where we grab the style id and name, which will be useful in our upcoming analysis. Another clause provides brewer info, including id, name, and area code, as well as country id and name. Ok, curious about the results? Let’s have a look at the first two entries:

“topBeers”: {
“items”: [
{
“id”: “166019”,
“name”: “Toppling Goliath Kentucky Brunch”,
“ratingCount”: 144,
“realAverage”: 4.574305534362793,
“style”: {
“id”: “24”,
“name”: “Stout – Imperial”
},
“styleScore”: 100.00000002692344,
“averageRating”: 4.532754898071289,
“brewer”: {
“id”: “11242”,
“name”: “Toppling Goliath Brewing Company”,
“country”: {
“id”: “213”,
“name”: “United States”
},
“areaCode”: “563”
}
},
{
“id”: “58057”,
“name”: “Närke Kaggen Stormaktsporter”,
“ratingCount”: 554,
“realAverage”: 4.4922380447387695,
“style”: {
“id”: “24”,
“name”: “Stout – Imperial”
},
“styleScore”: 100.00000002692344,
“averageRating”: 4.481779098510742,
“brewer”: {
“id”: “3682”,
“name”: “Närke Kulturbryggeri”,
“country”: {
“id”: “190”,
“name”: “Sweden”
},
“areaCode”: “19”
}
}

Interestingly, the first two beers are both Imperial Stouts, typically dark, full-bodied, high ABV (alcohol by volume) brews. You can see the number of ratings, the realAverage (an adjusted average score weighted based on user patterns), the style name, styleScore (a score adjusted by a beer’s ratings versus other examples of the same style), the brewer, country, and area code.

Our next step is to copy and paste these results into a code editing program (Brackets in my case), giving us the first 100 beers in a nice .json format. The slight challenge comes in getting the second, third, fourth, and fifth groups of 100 beers, totaling 500 in all. However, this is easily overcome by examining our final beer within each group, and simply referencing that beer’s id value as the starting point for the next query, using the after method:

query{topBeers(first:100 after:491745) {
items{id
name
ratingCount
realAverage
style {
id
name
}
styleScore
averageRating
brewer {
id
name
country {
id
name
}
areaCode
}
}
}}

Note the after value at the top of the query. This tells the GraphiQL query where to pick up the results to fetch beers 101-200. We then repeat this process until we have our full set of 500 beers. The final step is to then merge the five .json files into a single file, being careful to align the various brackets and braces (.json is a stickler for perfect syntax).

We now have our source file with the top 500 beers, which we can import into Exploratory, or any other analysis software that plays nice with .json. The next post in the series will highlight how we can use Exploratory to understand the composition of our top 500. Thanks for reading, and see you soon.

  • GraphiQL
  • RateBeer
  • top beers
  • Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    five + 19 =

    Tags

    Arclight beer Beer Maverick beer types Bell's breweries Brewers Friend BreweryDB brewing charts color ColorZilla comments data visualization data wrangling double ipa Exploratory Founders Gephi glassware glassware styles GraphiQL hops imperial stout json Mapbox mapping mead Narke Kaggen network graphs quadrupel RateBeer sentiment analysis sigma.js SRM styles text mining top beers Toppling Goliath Two Hearted yeast

    Recent Posts

    • Hops Network Using Flourish
    • Hop Network, Part 2
    • Developing a Hop Network
    • The Search for Beer Data, Part 1
    • Beer Types and Glassware Styles, Part 3

    Visit My Other Sites

    Visual-Baseball

    Visualidity

    JazzGraphs

    Recent Comments

      Archives

      • February 2023
      • December 2022
      • March 2020
      • February 2020
      • April 2019
      • March 2019
      • December 2018
      • November 2018

      Categories

      • beer
      • data analysis
      • data visualization
      • general
      • mapping
      • Uncategorized
      © 2023 BrewGraphs | Powered by Minimalist Blog WordPress Theme
      Scroll Up