Check out our entry for the Iron Viz competition and up vote if you like it. Our goal was to design something that looked at the proximity of opponents in the NCAA Final Four as part of our #AnalyticsRoadToTheFinalFour. It's a combination of a variety of Business Intelligence tools & solutions from AWS, Python, R, Atom, GeoJSon, and Tableau. All positioned in one dashboard to see how your team have seeded and traveled throughout the years. Feel free to check out your team and enjoy.
Below is a quick piece on the steps used to complete the project.
Step 1: Data Gathering
We used a combination of sources for this from http://www.sports-reference.com/cbb to API calls from Sportsradar. For more information on this API follow our upcoming blog on this but below is a basic call to grab the teams and loop through each team to gather the information. There is some pretty cool data in these feeds
##Python Script import csv as csv import numpy as np # Open up the csv file in to a Python object csv_file_object = csv.reader(open('/Users/name/Desktop/NCAA_TEAMS_UPLOAD.csv', 'rb')) header = csv_file_object.next() # The next() command just skips the # first line which is a header data=[] # Create a variable called 'data'. for row in csv_file_object: # Run through each row in the csvfile, data.append(row) # adding each row to the data variable ncaa_team_data = np.array(data) # Then convert from a list to an array # Be aware that each item is currently # a string in this format ##Loop through teams and gets stats using API import urllib2 import pandas as pd ##Use results and put them in a dataset to be used for looping through play by play data raw_data_list = [] for rows in ncaa_team_data: url = "http://api.sportradar.us/ncaamb-t3/seasontd/2016/REG/teams/"+rows[2]+'/'"statistics.json?api_key=#" try: raw_data = urllib2.urlopen(url).read() if raw_data != "": raw_data_list.append(raw_data) except: pass data = "[" + (",".join(raw_data_list)) + "]" data = pd.read_json(data, orient='records') data
Step 2: Data Wrangling & Geocoding
For this we switched to R to use it's geolocation packaging. With the game and the following data points: Home_City_State, Away_City_State, and Stadium_City_State we calculate the distance as a crow flies for each game.
## R Script ## Select the file from the file chooser fileToLoad <- file.choose(new = TRUE) ## Read in the CSV data and store it in a variable origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE) ## View Table View(origAddress) ##Library GGMAP library(ggmap) ## Initialize the data frame geocoded <- data.frame(stringsAsFactors = FALSE) # Loop through the addresses to get the latitude and longitude of each address and add it to the # origAddress data frame in new columns lat and lon # Stadium for(i in 1:nrow(origAddress)) { # Print("Working...") result <- geocode(origAddress$Stadium_City_State[i], output = "latlona", source = "google") origAddress$lon[i] <- as.numeric(result[1]) origAddress$lat[i] <- as.numeric(result[2]) origAddress$geoAddress[i] <- as.character(result[3]) } ## View Table View(origAddress) # Loop through the addresses to get the latitude and longitude of each address and add it to the # origAddress data frame in new columns lat and lon # Home Team for(i in 1:nrow(origAddress)) { # Print("Working...") result <- geocode(origAddress$Home_City_State[i], output = "latlona", source = "google") origAddress$lon_ht[i] <- as.numeric(result[1]) origAddress$lat_ht[i] <- as.numeric(result[2]) origAddress$geoAddress_ht[i] <- as.character(result[3]) } ##View Table ##View(origAddress) # Loop through the addresses to get the latitude and longitude of each address and add it to the # origAddress data frame in new columns lat and lon # Away Team for(i in 1:nrow(origAddress)) { # Print("Working...") result <- geocode(origAddress$Away_City_State[i], output = "latlona", source = "google") origAddress$lon_at[i] <- as.numeric(result[1]) origAddress$lat_at[i] <- as.numeric(result[2]) origAddress$geoAddress_ht[i] <- as.character(result[3]) } ## View Table View(origAddress) ## Get Distance Between Coordiantes ##devtools::install_github("tidyverse/magrittr") ##install.packages("dplyr") library(magrittr) library(dplyr) origAddress %>% head() ## Get Home Distance from Center hdistanceFromStadium = by(origAddress, 1:nrow(origAddress), function(row) { distHaversine(c(row$lon_ht, row$lat_ht), c(row$lon, row$lat)) }) ## Get Away Distance from Center adistanceFromStadium = by(origAddress, 1:nrow(origAddress), function(row) { distHaversine(c(row$lon_at, row$lat_at), c(row$lon, row$lat)) }) ## Add in hdistance and adistance hout = mutate(origAddress, hdistanceFromStadium) finalout = mutate(hout, adistanceFromStadium) ##Write to a CSV Out File # Write a CSV file to working directory ##write.csv(finalout, "geocoded_ncaa_distance.csv") write.csv(finalout, "/Users/name/desktop/geocode_ncaa_distance_14_17.csv")
Step 3: Create Visualization Using Data and Spatial File
We start with creating a geojson spatial file which Tableau can incorporate. This is my first time using geojson so I was glad I found http://geojson.io to help create my location map. I used my cvs to create the json copied this into Atom. Formatted as a .geojson and within minutes I had my first .geojson file.
With that complete we moved on to uploading the spatial file and the NCAA location team data to begin developing our Viz.
This part was pretty easy with some suggestions and tips from a few other Viz experts we were able to use the cool new pathing feature in 10 and bring across the logos. The spatial file became a nice to have but with the logos and everything else all we really needed was the city points which become the stadiums on the map.
Through in a few basic scatter charts and some action items and the dashboard is nice interactive way to travel the Final Four and follow your team and the effect distance had on them.
Enjoy.