How to parse out data into multiple excel files using R openxlsx. Convenient way to save time and automate a painfully monotonous task.
R
workflow automation
Excel
Published
January 4, 2024
Intro
I remember a moment back in my fledgling analyst days where I was assigned a Herculean task to separate a giant Excel workbook into multiple files for different clients. It took me several days to filter and copy/paste each data subset into its own workbook. And it brought me absolutely no joy. In the back of my mind I thought there has to be a better way to do this.
Yes - there IS a better way. But it wasn’t until much later in my career that I learned tools that could help me with worklow automation. If you know a little bit of R, you can 100% automate this task and do it in less than 10 minutes! I’ll walk you through a scenario in this post.
Scenario
Let’s pretend we’re an analyst working for a sports analytics firm. Your manager, Chris, has a list of all the latest NFL team rosters. Chris manages a team of experts that focus on player performance for each team within the NFL. He’s asked for your help to create a report for each expert listing all players for their respective team.
Here’s a quick preview of the data:
library(nflreadr)library(kableExtra)library(dplyr)#load in data form nflreadr to see team rostersroster<-nflreadr::load_rosters()|>#subset data with dplyr to select specific columns dplyr::select(last_name, first_name, status, position, jersey_number, team)#preview first 5 rows of datakableExtra::kable(head(roster,5))
last_name
first_name
status
position
jersey_number
team
Williams
Bernard
CUT
OL
74
PHI
Peters
Jason
INA
OL
70
SEA
Rodgers
Aaron
INA
QB
8
NYJ
Prater
Matt
ACT
K
5
ARI
Lewis
Marcedes
ACT
TE
84
CHI
And here’s a look at how many team files we’d need…
Creating 32 files by hand would be tedious. But luckily, R is going to do all the heavy lifting for us.
Writing Data to Excel
There are a few different libraries that can help us write data to Excel. In this tutorial, we’ll use openxlsx to create Excel workbooks, write data as Excel tables, and save them.
There’s a lot more to this powerful library that goes beyond this tutorial, for a full run-down of openxlsx’s functions, be sure to check out the CRAN package doc.
Let’s try it out with the roster data to familiarize ourselves with a few basic functions.
library(openxlsx)#initialize a workbookwb <-createWorkbook()#add a worksheet in workbookaddWorksheet(wb, sheet ="Roster")#write data with filters to workbookwriteData(wb, sheet="Roster", roster, withFilter =TRUE)#save and write workbook saveWorkbook(wb, file ="roster.xlsx", overwrite =TRUE)
Alternatively, if we want to write and store data as an Excel table, openxlsx also has a writeDataTable function that works similarly to writeData.
Power of For Loops
Of course the main task at hand is not writing one Excel workbook, but creating a workbook per team. This is a good use case to use for loops to help us repeat the task for each data subset.
First, a quick example of how to filter data with dplyr to create our data subset by team. The example below
#filter roster to see players for Tampa Bay, TBtb_roster <- roster|>filter(team=='TB')#preview roster subsetkable(head(tb_roster,5))
last_name
first_name
status
position
jersey_number
team
David
Lavonte
ACT
LB
54
TB
Jensen
Ryan
RES
OL
66
TB
Gholston
William
ACT
DL
92
TB
Barrett
Shaquil
ACT
LB
7
TB
Feiler
Matt
ACT
OL
71
TB
Now, let’s combine what we’ve learned from creating workbooks and subsetting data into a for loop like this. Before running the code, we create a new directory (folder) to save all of our future rosters to. We aptly call this directly “team_rosters.” This is important to note for our workbook file path!
#for each team, aka t, in teamsfor(t in teams){#filter data for team, bonus arrange by last name team_roster = roster|>filter(team==t)|>arrange(last_name)#create new workbook wb <-createWorkbook()#create a worksheet, Roster, in workbook wbaddWorksheet(wb, "Roster")#write a data table to sheet Roster in wbwriteDataTable(wb, sheet="Roster", team_roster, withFilter =TRUE, tableName="roster")#create a new workbook title, include path if applicable wb_file_path =paste("team_rosters/", tolower(t),"_roster.xlsx", sep='')#save file saveWorkbook(wb, file = wb_file_path, overwrite =TRUE)}
Once the code runs, we excitedly rush to open our directory, to find 32 glorious files - one for each team. We open one to inspect, and see the list of players in a sheet called “Rosters”, formatted in a basic Excel table.
With the job done, we send the files off to Chris to distribute to his team. We look at the clock, it’s time for lunch - we log off for a well deserved lunch break.
But There’s More
Chris is impresesed we got the files back to him so quickly. You come back from lunch and see an email in your inbox with his latest request.
Only after the fact, Chris realizes it would be more useful if each expert could see the list split out for active players vs. inactive players (he doesn’t trust the experts to use the table filters…their Excel skills pale in comparison to your prowess).
In the example below, we’ll make the fix to create TWO sheets per team excel file. One with active players, one with inactive players. To help differentiate the tables, we can also use the tableStyle argument in writeDataTable to change the table style.
for(t in teams){ team_roster = roster|>filter(team==t)|>arrange(last_name) wb <-createWorkbook()#set up two worksheets in the workbookaddWorksheet(wb, "Active")addWorksheet(wb, "Not Active")#store active players "ACT" as a table in the Active tabwriteDataTable(wb, "Active", team_roster[team_roster$status=="ACT", ], withFilter =TRUE, tableName="active")#store iactive players (not equal to "ACT") as a table in the Active tab, change table stylewriteDataTable(wb, "Not Active", team_roster[team_roster$status!="ACT", ], withFilter =TRUE, tableName="inactive", tableStyle ="TableStyleLight10") wb_file_path =paste("team_rosters/", tolower(t),"_roster.xlsx", sep='')saveWorkbook(wb, file = wb_file_path, overwrite =TRUE)}
The code runs in less than 10 seconds, and the 32 excel files resave to our director. We rush to inspect the first one, and fist pump triumphantly to see two worksheets with the roster now split into active and inactive players.
We save the files to the company OneDrive and email Chris the update an hour later (you decide you can’t make it look too easy…64 tables in less than a minute is record speed). All done with plenty of time to spare for your other projects!
Thank you for checking out my blog, let me know if you have questions!