R is a free, open-source programming language and software environment for statistical computing and graphics. Learning how to use R for data analysis and visualisation purposes can be a daunting task. However, there are a number of free online resources to guide basic analysis and troubleshoot where possible. These include:

- Cookbook for R: An online guide to provide solutions for common tasks and problems when analysing data in R
- R Tutorial: An introduction to statistics that explains basic concepts in R
- Quick-R: A website that assists with data input, management and statistics in R
- R-bloggers: A news and information site that pulls together blog posts on R
- Stack Overflow: A question and answer forum on all things code, statistics and plotting

The above websites are a fantastic resource on how to get started in R with basic analysis. To complement, I have constructed a basic guide for sport science and physiology users using athlete load as an example. Personally, I prefer to work in RStudio, which provides a free, friendly interface to run code and view plots.

To start, run the following code to load a .csv or .txt file into R and name the file as “RawLoadData”. You will need to substitute the file location below with your own.

# Read a .csv file into R RawLoadData <- read.csv("/Volumes/Research/Thesis/Manuscripts/AthleteLoadData.csv") # Read a .txt file into R RawLoadData <- read.table("/Volumes/Research/Thesis/Manuscripts/AthleteLoadData.txt")

R uses `data.frames`

and matrices to store data. The difference between the two, is a matrix requires all rows and columns to be of the same class, numeric or factor, for example. A `data.frame`

allows you to have a mixture of the two. You can switch between the two using `as.data.frame`

and `as.matrix`

, although be aware that if you convert a `data.frame`

with different classes, they will all be characters in a matrix. To create a data.frame or matrix, use the following code:

# Create a data.frame RawLoadData <- as.data.frame(RawLoadData) # Create a matrix RawLoadData <- as.matrix(RawLoadData)

To create a `data.frame`

of dummy athlete load data collected over seven days, use the code below.

# Create a list of athlete names Athletes <- c("Charles", "Mia", "Alfie", "Sophie") # Call out the constants NumberOfAthletes <- 4 DaysOfLoad <- 7 set.seed(28) # Create the data.frame RawLoadData <- data.frame(Athletes = rep(Athletes, DaysOfLoad), Day = rep(1:DaysOfLoad, each = NumberOfAthletes), Load = runif(NumberOfAthletes * DaysOfLoad, min = 0, max = 100))

The structure of a `data.frame`

can be accessed by typing and running the code below. This allows us to see that the dataset consists of athlete names, or factors, days or integers and load that consists of a numeric variable.

str(RawLoadData)

Columns of a `data.frame`

can be viewed by typing and running `RawLoadData$Athletes`

however, this command will not work for a matrix. To create a new column and add to our existing `data.frame`, such as the playing position of each athlete, type and run the following code:

# To create a new column RawLoadData$PlayingPosition <- c("Midcourt")

Summary statistics of grouped data can easily be calculated with assistance from the `plyr`

package, that will need to be installed into R prior to first use. To calculate the mean and SD of load, purely as an example, for each day use the following:

# Load the required package require(plyr) # To calculate the mean and SD for load, across each day SummaryLoadData = ddply(RawLoadData, c("Day"), summarise, Mean = mean(Load), SD = sd(Load))

The above data can then be plotted the `ggplot2`

package which needs to first be installed into R. Use the code below to visualise the mean data:

# Load the required package require(ggplot2) # Basic plot of the mean and SD of load over a seven day period ggplot(SummaryLoadData, aes(x = Day, y = Mean)) + geom_bar(stat = "identity")

A few tweaks will deliver us a much more visually pleasing plot. To have a white background, coloured bars, clearer axis labels and bold ticks plus marks, use the following code:

# A few tweaks to create a neater plot ggplot(SummaryLoadData, aes(x = Day, y = Mean, fill = factor(Day))) + geom_bar(stat = "identity") + ylab("Average Load (AU)\n") + xlab("\nDay") + scale_y_continuous(expand = c(0, 0), limits = c(0, 80)) + scale_x_continuous(breaks = c(1:7)) + theme_classic() + theme(legend.position = "none", axis.line = element_line(colour = "black", size = .75, linetype = "solid"), axis.title.x = element_text(face = "bold", size = 15), axis.title.y = element_text(face = "bold", size = 15), axis.text.y = element_text(face = "bold"), axis.text.x = element_text(face = "bold"), axis.ticks = element_line(size = .5))

To plot individual load data, use the following code:

# Individual responses ggplot(RawLoadData, aes(x = Day, y = Load, fill = factor(Athletes))) + geom_bar(stat = "identity") + ylab("Load (AU)\n") + xlab("\nDay") + scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) + scale_x_continuous(breaks = c(1:7)) + theme_classic() + theme(strip.text.x = element_text(size = 12, face = "bold"), strip.background = element_rect(colour = "black", size = 1.5), legend.position = "none", axis.line = element_line(colour = "black", size = .75, linetype = "solid"), axis.title.x = element_text(face = "bold", size = 12), axis.title.y = element_text(face = "bold", size = 12), axis.text.y = element_text(face = "bold"), axis.text.x = element_text(face = "bold"), axis.ticks = element_line(size = .5)) + facet_wrap(~ Athletes)

To overlay the average load over each individual’s data, use the following code:

# Extend the Summary Load data.frame SummaryLoadData <- SummaryLoadData[rep(seq_len(nrow(SummaryLoadData)), each=4),] # Add the mean column to the RawLoadData frame RawLoadData$Mean <- SummaryLoadData$Mean # To plot individual athlete load data ggplot(RawLoadData, aes(x = Day, y = Load, fill = factor(Athletes))) + geom_bar(stat = "identity") + geom_line(aes(y = Mean, x = Day), color = "Black", size = 1) + geom_point(aes(y = Mean, x = Day), color = "Black", size = 1.5) + ylab("Load (AU)\n") + xlab("\nDay") + scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) + scale_x_continuous(breaks = c(1:7)) + theme_classic() + theme(strip.text.x = element_text(size = 12, face = "bold"), strip.background = element_rect(colour = "black", size = 1.5), legend.position = "none", axis.line = element_line(colour = "black", size = .75, linetype = "solid"), axis.title.x = element_text(face = "bold", size = 12), axis.title.y = element_text(face = "bold", size = 12), axis.text.y = element_text(face = "bold"), axis.text.x = element_text(face = "bold"), axis.ticks = element_line(size = .5)) + facet_wrap(~ Athletes)

The above is only a small introduction to R’s analysis and visualising capabilities. How do you analyse and present athlete load data?