Retail Sales data analysis with interactive plots using Plotly

code
analysis
visualization
plotly
Author
Published

December 13, 2022

In this post, I will analyse the #TidyTuesday dataset about Retail Sales in the US across all States between 2019 and 2022.

The data comes from the United States Census Bureau’s Monthly State Retail Sales. The Monthly State Retail Sales (MSRS) is the Census Bureau’s new experimental data product featuring modeled state-level retail sales. You can find the dataset on Tidy Tuesday here.

Tip

You can find my github code repository here.

1 Load libraries

# For loading Tidy Tuesday data
library(tidytuesdayR)

# EDA
library(tidyverse)
library(DT)

# Interactive visualization
library(plotly)
library(crosstalk)

2 Load data

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")

state_retail <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-12-13/state_retail.csv',  col_types = "cciciiccc")
coverage_codes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-12-13/coverage_codes.csv')
datatable(state_retail %>% filter(state_abbr == "WA") %>% 
  select(state_abbr, year, month, subsector, change_yoy))

3 Impute and clean the data for Washington

After filtering data for Washington, I use the tidyr fill function to impute missing data within each subsector with the next complete value. While there are many ways of imputing the data, I chose this method to indicate that change is more likely to be tending towards the next available value of change.

You may choose not to impute the data as well or use other methods like average.

state_data_imputed <-  state_retail %>% 
  filter(state_abbr == 'WA') %>%
  arrange(subsector) %>% 
  select(state_abbr, year, month, subsector, change_yoy) %>% 
  mutate(change_yoy = as.numeric(change_yoy %>% str_remove('S'))) %>% 
  group_by(subsector) %>% 
  fill(change_yoy, .direction = "up") %>% #Replace missing data with next good value within the group
  ungroup() %>% 
  mutate(date = ifelse(month < 10, paste0(year,'-0',month, '-01'), paste0(year,'-',month, '-01'))) %>% # Create a readable date column
  select(state_abbr, subsector, date, change_yoy)

datatable(state_data_imputed)

4 Plot using “Plotly” to have date range interactivity

I use Plotly here to add the interactivity of zooming to a certain time range.

state_data_imputed %>% 
  plot_ly( type = 'scatter', mode = 'lines') %>%
  add_trace(x = ~date, y = ~change_yoy, color = ~subsector,
    hoverinfo = "text", text = ~paste0(subsector,"\n",date,"\n",change_yoy)) %>%
  layout(showlegend = F, title='Washington Retail Sales delta YoY',
         xaxis = list(rangeslider = list(visible = T),
                      zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff',
                      title = ''),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff',
                      title = 'Change YoY',
                      range=list(-150, 500)),
         plot_bgcolor='#e5ecf6', width = 750, height = 450)

5 Use “Crosstalk” with Plotly to enable selecting, highlighting and focusing

As there are so many retail sectors, Crosstalk allows me to add a component where I can highlight one or more sectors making it easier to compare.

## Create a crosstalk component
sector_data <- SharedData$new(state_data_imputed, key = ~subsector, group = "Select a Retail Sector")

# Plot similar to the previous plotly function but add a highlight command
sector_data %>% 
  plot_ly( type = 'scatter', mode = 'lines') %>%
  add_trace(x = ~date, y = ~change_yoy, color = ~subsector,
    hoverinfo = "text", text = ~paste0(subsector,"\n",date,"\n",change_yoy)) %>%
  layout(showlegend = F, title='Washington Retail Sales delta YoY',
         xaxis = list(rangeslider = list(visible = T))) %>%
  layout(
         xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff',
                      title = ''),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff',
                      title = 'Change YoY',
                      range=list(-150, 500)),
         plot_bgcolor='#e5ecf6', width = 750) %>%
  highlight(selectize = TRUE, persistent = TRUE) # This adds the selectize option to easily input sector
Tip

You can find my github code repository here. Follow me on medium and linkedIn to stay tuned for my next article.