Introduction to R and A Basic Analysis of Athlete Load

R is a free, open-source programming language and software environment for statistical computing and graphics. Learning how to use R for data analysis and visualisation purposes can be a daunting task. However, there are a number of free online resources to guide basic analysis and troubleshoot where possible. These include:

  • Cookbook for R: An online guide to provide solutions for common tasks and problems when analysing data in R
  •  R Tutorial: An introduction to statistics that explains basic concepts in R
  • Quick-R: A website that assists with data input, management and statistics in R
  • R-bloggers: A news and information site that pulls together blog posts on R
  • Stack Overflow: A question and answer forum on all things code, statistics and plotting

The above websites are a fantastic resource on how to get started in R with basic analysis. To complement, I have constructed a basic guide for sport science and physiology users using athlete load as an example. Personally, I prefer to work in RStudio, which provides a free, friendly interface to run code and view plots.

To start, run the following code to load a .csv or .txt file into R and name the file as “RawLoadData”. You will need to substitute the file location below with your own.

# Read a .csv file into R
RawLoadData <- read.csv("/Volumes/Research/Thesis/Manuscripts/AthleteLoadData.csv")
# Read a .txt file into R
RawLoadData <- read.table("/Volumes/Research/Thesis/Manuscripts/AthleteLoadData.txt")

R uses data.frames and matrices to store data. The difference between the two, is a matrix requires all rows and columns to be of the same class, numeric or factor, for example. A data.frame allows you to have a mixture of the two. You can switch between the two using as.data.frame and as.matrix, although be aware that if you convert a data.frame with different classes, they will all be characters in a matrix. To create a data.frame or matrix, use the following code:

# Create a data.frame
RawLoadData <- as.data.frame(RawLoadData)
# Create a matrix
RawLoadData <- as.matrix(RawLoadData)

To create a data.frame of dummy athlete load data collected over seven days, use the code below.

# Create a list of athlete names
Athletes <- c("Charles", "Mia", "Alfie", "Sophie")
# Call out the constants
NumberOfAthletes <- 4
DaysOfLoad <- 7
set.seed(28)
# Create the data.frame
RawLoadData <- data.frame(Athletes = rep(Athletes, DaysOfLoad),
Day = rep(1:DaysOfLoad, each = NumberOfAthletes),
Load = runif(NumberOfAthletes * DaysOfLoad, min = 0, max = 100))

The structure of a data.frame can be accessed by typing and running the code below. This allows us to see that the dataset consists of athlete names, or factors, days or integers and load that consists of a numeric variable.

str(RawLoadData)

Columns of a data.frame can be viewed by typing and running RawLoadData$Athletes however, this command will not work for a matrix. To create a new column and add to our existing `data.frame`, such as the playing position of each athlete, type and run the following code:

# To create a new column
RawLoadData$PlayingPosition <- c("Midcourt")

Summary statistics of grouped data can easily be calculated with assistance from the plyr package, that will need to be installed into R prior to first use. To calculate the mean and SD of load, purely as an example, for each day use the following:

# Load the required package
require(plyr)
# To calculate the mean and SD for load, across each day
SummaryLoadData = ddply(RawLoadData, c("Day"), summarise,
Mean = mean(Load),
SD = sd(Load))

The above data can then be plotted the ggplot2 package which needs to first be installed into R. Use the code below to visualise the mean data:

# Load the required package
require(ggplot2)
# Basic plot of the mean and SD of load over a seven day period
ggplot(SummaryLoadData, aes(x = Day, y = Mean)) +
geom_bar(stat = "identity")

Load_BasicPlot

A few tweaks will deliver us a much more visually pleasing plot. To have a white background, coloured bars, clearer axis labels and bold ticks plus marks, use the following code:

# A few tweaks to create a neater plot
ggplot(SummaryLoadData, aes(x = Day, y = Mean, fill = factor(Day))) +
geom_bar(stat = "identity") +
ylab("Average Load (AU)\n") +
xlab("\nDay") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 80)) +
scale_x_continuous(breaks = c(1:7)) +
theme_classic() +
theme(legend.position = "none",
axis.line = element_line(colour = "black", size = .75, linetype = "solid"),
axis.title.x = element_text(face = "bold", size = 15),
axis.title.y = element_text(face = "bold", size = 15),
axis.text.y = element_text(face = "bold"),
axis.text.x = element_text(face = "bold"),
axis.ticks = element_line(size = .5))

Load_Neater

To plot individual load data, use the following code:

# Individual responses
ggplot(RawLoadData, aes(x = Day, y = Load, fill = factor(Athletes))) +
geom_bar(stat = "identity") +
ylab("Load (AU)\n") +
xlab("\nDay") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) +
scale_x_continuous(breaks = c(1:7)) +
theme_classic() +
theme(strip.text.x = element_text(size = 12, face = "bold"),
strip.background = element_rect(colour = "black", size = 1.5),
legend.position = "none",
axis.line = element_line(colour = "black", size = .75, linetype = "solid"),
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(face = "bold"),
axis.text.x = element_text(face = "bold"),
axis.ticks = element_line(size = .5)) +
facet_wrap(~ Athletes)

Load_IndividResps

To overlay the average load over each individual’s data, use the following code:

# Extend the Summary Load data.frame
SummaryLoadData <- SummaryLoadData[rep(seq_len(nrow(SummaryLoadData)), each=4),]
# Add the mean column to the RawLoadData frame
RawLoadData$Mean <- SummaryLoadData$Mean
# To plot individual athlete load data
ggplot(RawLoadData, aes(x = Day, y = Load, fill = factor(Athletes))) +
geom_bar(stat = "identity") +
geom_line(aes(y = Mean, x = Day), color = "Black", size = 1) +
geom_point(aes(y = Mean, x = Day), color = "Black", size = 1.5) +
ylab("Load (AU)\n") +
xlab("\nDay") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) +
scale_x_continuous(breaks = c(1:7)) +
theme_classic() +
theme(strip.text.x = element_text(size = 12, face = "bold"),
strip.background = element_rect(colour = "black", size = 1.5),
legend.position = "none",
axis.line = element_line(colour = "black", size = .75, linetype = "solid"),
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(face = "bold"),
axis.text.x = element_text(face = "bold"),
axis.ticks = element_line(size = .5)) +
facet_wrap(~ Athletes)

Load_IndividMeans

The above is only a small introduction to R’s analysis and visualising capabilities. How do you analyse and present athlete load data?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s