In this post, I will analyse the #TidyTuesday dataset about Retail Sales in the US across all States between 2019 and 2022.
The data comes from the United States Census Bureau’s Monthly State Retail Sales. The Monthly State Retail Sales (MSRS) is the Census Bureau’s new experimental data product featuring modeled state-level retail sales. You can find the dataset on Tidy Tuesday here.
You can find my github code repository here.
1 Load libraries
2 Load data
# Get the Data
# Read in with tidytuesdayR package
# Install from CRAN via: install.packages("tidytuesdayR")
state_retail <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-12-13/state_retail.csv', col_types = "cciciiccc")
coverage_codes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-12-13/coverage_codes.csv')
3 Impute and clean the data for Washington
After filtering data for Washington, I use the tidyr fill function to impute missing data within each subsector with the next complete value. While there are many ways of imputing the data, I chose this method to indicate that change is more likely to be tending towards the next available value of change.
You may choose not to impute the data as well or use other methods like average.
state_data_imputed <- state_retail %>%
filter(state_abbr == 'WA') %>%
arrange(subsector) %>%
select(state_abbr, year, month, subsector, change_yoy) %>%
mutate(change_yoy = as.numeric(change_yoy %>% str_remove('S'))) %>%
group_by(subsector) %>%
fill(change_yoy, .direction = "up") %>% #Replace missing data with next good value within the group
ungroup() %>%
mutate(date = ifelse(month < 10, paste0(year,'-0',month, '-01'), paste0(year,'-',month, '-01'))) %>% # Create a readable date column
select(state_abbr, subsector, date, change_yoy)
datatable(state_data_imputed)
4 Plot using “Plotly” to have date range interactivity
I use Plotly here to add the interactivity of zooming to a certain time range.
state_data_imputed %>%
plot_ly( type = 'scatter', mode = 'lines') %>%
add_trace(x = ~date, y = ~change_yoy, color = ~subsector,
hoverinfo = "text", text = ~paste0(subsector,"\n",date,"\n",change_yoy)) %>%
layout(showlegend = F, title='Washington Retail Sales delta YoY',
xaxis = list(rangeslider = list(visible = T),
zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff',
title = ''),
yaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff',
title = 'Change YoY',
range=list(-150, 500)),
plot_bgcolor='#e5ecf6', width = 750, height = 450)
5 Use “Crosstalk” with Plotly to enable selecting, highlighting and focusing
As there are so many retail sectors, Crosstalk allows me to add a component where I can highlight one or more sectors making it easier to compare.
## Create a crosstalk component
sector_data <- SharedData$new(state_data_imputed, key = ~subsector, group = "Select a Retail Sector")
# Plot similar to the previous plotly function but add a highlight command
sector_data %>%
plot_ly( type = 'scatter', mode = 'lines') %>%
add_trace(x = ~date, y = ~change_yoy, color = ~subsector,
hoverinfo = "text", text = ~paste0(subsector,"\n",date,"\n",change_yoy)) %>%
layout(showlegend = F, title='Washington Retail Sales delta YoY',
xaxis = list(rangeslider = list(visible = T))) %>%
layout(
xaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff',
title = ''),
yaxis = list(zerolinecolor = '#ffff',
zerolinewidth = 2,
gridcolor = 'ffff',
title = 'Change YoY',
range=list(-150, 500)),
plot_bgcolor='#e5ecf6', width = 750) %>%
highlight(selectize = TRUE, persistent = TRUE) # This adds the selectize option to easily input sector