Fanalyticsports | Iron Viz 2017

Check out our entry for the Iron Viz competition and up vote if you like it. Our goal was to design something that looked at the proximity of opponents in the NCAA Final Four as part of our #AnalyticsRoadToTheFinalFour. It's a combination of a variety of Business Intelligence tools & solutions from AWS, Python, R, Atom, GeoJSon, and Tableau. All positioned in one dashboard to see how your team have seeded and traveled throughout the years. Feel free to check out your team and enjoy. 

Below is a quick piece on the steps used to complete the project. 

Step 1: Data Gathering

We used a combination of sources for this from to API calls from Sportsradar. For more information on this API follow our upcoming blog on this but below is a basic call to grab the teams and loop through each team to gather the information. There is some pretty cool data in these feeds

##Python Script
import csv as csv 
import numpy as np

# Open up the csv file in to a Python object
csv_file_object = csv.reader(open('/Users/name/Desktop/NCAA_TEAMS_UPLOAD.csv', 'rb')) 
header =  # The next() command just skips the 
                                 # first line which is a header
data=[]                          # Create a variable called 'data'.
for row in csv_file_object:      # Run through each row in the csvfile,
    data.append(row)             # adding each row to the data variable
ncaa_team_data = np.array(data) # Then convert from a list to an array
               # Be aware that each item is currently
               # a string in this format
##Loop through teams and gets stats using API               
import urllib2
import pandas as pd

##Use results and put them in a dataset to be used for looping through play by play data
raw_data_list = []

for rows in ncaa_team_data:
    url = ""+rows[2]+'/'"statistics.json?api_key=#"
        raw_data = urllib2.urlopen(url).read()
        if raw_data != "":

data = "[" + (",".join(raw_data_list)) + "]"
data = pd.read_json(data, orient='records')

Step 2: Data Wrangling & Geocoding

For this we switched to R to use it's geolocation packaging. With the game and the following data points: Home_City_State, Away_City_State, and Stadium_City_State we calculate the distance as a crow flies for each game. 

## R Script
## Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)
## Read in the CSV data and store it in a variable 
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
## View Table 
##Library GGMAP 
## Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
# Stadium
for(i in 1:nrow(origAddress))
  # Print("Working...")
  result <- geocode(origAddress$Stadium_City_State[i], output = "latlona", source = "google")
  origAddress$lon[i] <- as.numeric(result[1])
  origAddress$lat[i] <- as.numeric(result[2])
  origAddress$geoAddress[i] <- as.character(result[3])
## View Table 
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
# Home Team 
for(i in 1:nrow(origAddress))
  # Print("Working...")
  result <- geocode(origAddress$Home_City_State[i], output = "latlona", source = "google")
  origAddress$lon_ht[i] <- as.numeric(result[1])
  origAddress$lat_ht[i] <- as.numeric(result[2])
  origAddress$geoAddress_ht[i] <- as.character(result[3])
##View Table 
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
# Away Team 
for(i in 1:nrow(origAddress))
  # Print("Working...")
  result <- geocode(origAddress$Away_City_State[i], output = "latlona", source = "google")
  origAddress$lon_at[i] <- as.numeric(result[1])
  origAddress$lat_at[i] <- as.numeric(result[2])
  origAddress$geoAddress_ht[i] <- as.character(result[3])
## View Table 
## Get Distance Between Coordiantes
origAddress %>% head()
## Get Home Distance from Center
hdistanceFromStadium = by(origAddress, 1:nrow(origAddress), function(row) { distHaversine(c(row$lon_ht, row$lat_ht), c(row$lon, row$lat))  })
## Get Away Distance from Center
adistanceFromStadium = by(origAddress, 1:nrow(origAddress), function(row) { distHaversine(c(row$lon_at, row$lat_at), c(row$lon, row$lat))  })
## Add in hdistance and adistance 
hout = mutate(origAddress, hdistanceFromStadium)
finalout = mutate(hout, adistanceFromStadium)
##Write to a CSV Out File 
# Write a CSV file to working directory
##write.csv(finalout, "geocoded_ncaa_distance.csv")
write.csv(finalout, "/Users/name/desktop/geocode_ncaa_distance_14_17.csv")

Step 3: Create Visualization Using Data and Spatial File


We start with creating a geojson spatial file which Tableau can incorporate. This is my first time using geojson so I was glad I found to help create my location map. I used my cvs to create the json copied this into Atom. Formatted as a .geojson and within minutes I had my first .geojson file. 

With that complete we moved on to uploading the spatial file and the NCAA location team data to begin developing our Viz.

This part was pretty easy with some suggestions and tips from a few other Viz experts we were able to use the cool new pathing feature in 10 and bring across the logos. The spatial file became a nice to have but with the logos and everything else all we really needed was the city points which become the stadiums on the map.

Through in a few basic scatter charts and some action items and the dashboard is nice interactive way to travel the Final Four and follow your team and the effect distance had on them.