Validity by Linear Regression in R

I am currently analysing the data for one of my PhD studies, specifically, assessing the validity of an athlete tracking system against the current gold standard. Naturally, I wanted to run the analysis in R so set about compiling code to do so.

Below is an example of a simple linear relationship between a practical measure, such as a global positioning system (GPS) and a criterion measure or the VICON motion analysis system. The dependent variable of interest is peak velocity attained during an agility circuit. I am interested in the typical error of the estimate (TEE) and have included code for log transforming data, to obtain an estimation of the errors as percents of the mean. The code is designed to complement and should replicate the spreadsheet for calculating validity by Professor Will Hopkins. For further information on the TEE obtained by these spreadsheets, click here to view Will’s informative website.

To start with, construct a list of 45 random athlete names:

Athletes = c("Gus", "Hudson", "Bobby", "Tom", "Jessie", "Brent", "Sue", "Cam", "Steve", "Andrew", "Tony", "Jessica", 
"Heather", "Ritchie", "Joe", "Blair", "Elliott", "Sean", "Jarrad", "Dave", "Michele", "Kristine", "Lee", 
"Sharyn", "Ollie", "Marni", "Aaron", "Thomas", "Alan", "Wayan", "Valeryia", "Daria", "Fay", "Patricia", 
"Nancy", "Arthur", "John", "Isobel", "Molly", "Amanda", "Wendy", "Helen", "Bart", "Peter", "Stu")

Now, create a data.frame of dummy peak velocity data, from our practical and criterion measures, for each athlete. It is important you run the set.seed function to ensure replication of data.

# Set the seed
# Create a data.frame of dummy peak velocity data from two different tracking systems
ValidityDataset <- data.frame(Name = rep((Athletes), each = 1),
Criterion = runif(45, 4.5, 6.7),
Practical = runif(45, 4.1, 8))

Run the code below to obtain basic summary statistics for each measure of peak velocity.

# Return basic summary statistics 
Criterion_Mean <- mean(ValidityDataset$Criterion)
Criterion_SD <- sd(ValidityDataset$Criterion)
Practical_Mean <- mean(ValidityDataset$Practical)
Practical_SD <- sd(ValidityDataset$Practical)
SampleSize <- length(ValidityDataset$Name)

You will see that the mean of peak velocity derived from GPS is 6.1 ± 1.1 m·s¯1 whilst the Vicon is 5.7 ± 0.6 m·s¯1 whilst our n = 45. To view this data as a figure in our favourite plotting package, ggplot2 we first have to “wrangle” the data into long format or repeated measures format. This is quite straightforward and can be done by simply running the code below. The package reshape2 gives the flexibility to quickly turn data from wide to long format. You may need to install it into R before running the first line.

# Load the required package
ValidityDatasetLong <- melt(ValidityDataset, id.vars="Name","PeakVelocity","Measure")

Now our data is in long format, we can easily calculate means and SD’s. There is the option to calculate other statistics, including the SEM however I advise against using that one!

# Load the required package
ValiditySummaryData <- ddply(ValidityDatasetLong, c("Measure"), summarise,
Mean = mean(PeakVelocity),
SD = sd(PeakVelocity))

The line of code below will create a lovely barplot of the peak velocity attained from our two tracking systems.

ggplot(ValiditySummaryData, aes(x=Measure, y=Mean, fill=Measure)) +
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD), width=.2,
position=position_dodge(.9)) +
coord_cartesian(ylim=c(0,8)) +
xlab("\n Tracking System") +
ylab(expression(Peak ~ Velocity ~ (m.s^-1))) +
theme_classic() +
axis.title.y = element_text(vjust=1.5))


If you prefer to present only the upper limits of the SD bars, use the code below:

ggplot(ValiditySummaryData, aes(x=Measure, y=Mean, fill=Measure)) +
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=Mean, ymax=Mean+SD), width=.2,
position=position_dodge(.9)) +
coord_cartesian(ylim=c(0,8)) +
xlab("\n Tracking System") +
ylab(expression(Peak ~ Velocity ~ (m.s^-1))) +
theme_classic() +
axis.title.y = element_text(vjust=1.5))


Now, we create a basic linear regression model of the criterion and practical values.

# Linear model of Criterion as a function of the Practical measure
PeakVelocityLM <- lm(Criterion ~ Practical, data = ValidityDataset)
# Summary of the linear regression model
# Return upper and lower confidence limits at the 90% level
confint(PeakVelocityLM, level = 0.90)
# Extract the Intercept
Intercept_PeakVelocityLM <- coef(PeakVelocityLM)["(Intercept)"]
# Extract the Slope
Slope_PeakVelocityLM <- coef(PeakVelocityLM)["Practical"]

By running summary(PeakVelocityLM) we can see that the intercept is 4.75 whilst the slope is 0.15. If you wish to present the 95% CI, simply change the level to = 0.95.

To obtain a plot of the regression model, run the following code:

ggplot(ValidityDataset, aes(x = Criterion, y = Practical)) +
geom_point() +
stat_smooth(method = "lm", col = "red", se = FALSE) +
xlab("\n Criterion") +
ylab("Practical \n") +


I found a cool piece of code from another R blog that plotted the above figure with the model summary data as a title. I made a few tweaks, to enable a clear background and no SE! You can obtain this figure by running the following code:

# Function to extract values out of linear regression model
ggplotRegression <- function (fit) {
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red", se = FALSE) +
xlab("\n Criterion") +
ylab("Practical \n") +
theme_classic() +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))}
# Run the following line to obtain the figure


Now we move on to calculating the predicted and residual values, plus the difference in raw units between the practical and criterion measures.

# Calculate the predicted value
ValidityDataset$Predicted <- (Intercept_PeakVelocityLM + Slope_PeakVelocityLM *
# Calculate the residual value
ValidityDataset$Residual <- ValidityDataset$Criterion - ValidityDataset$Predicted
# Calculate the difference b/w Practical and Criterion
ValidityDataset$PracMinusCrit <- ValidityDataset$Practical - ValidityDataset$Criterion

To obtain a neat plot of the residuals versus the predicted values, run the following line of code:

# Plot the residuals versus predicted
ggplot(ValidityDataset, aes(x=Predicted, y=Residual)) +
geom_hline(yintercept=0, col="red", linetype="solid") +
coord_cartesian(ylim=c(-1.5,1.5), xlim=c(5.25,6)) +
xlab("\n Predicted") +
ylab("Residual \n") +


Now, for the fun stuff! The code below calculates the mean bias in raw and standardised units, plus the SD of these.

# Calculate the mean bias in raw units
MeanBias_RawUnits <- mean(ValidityDataset$PracMinusCrit)
# Calculate the mean bias, standardised
MeanBias_Standardised <- MeanBias_RawUnits/Criterion_SD
# Calculate the standard deviation of the Practical minus Criterion
SD_PracMinusCrit <- sd(ValidityDataset$PracMinusCrit)
# Calculate the SD of bias, standardised
SDBias_Standardised <- SD_PracMinusCrit/Criterion_SD

The smallest important difference in raw units is also calculated.

# Call out the smallest important difference
SmallestImportantDiff <- 0.2
# In standardised units
RawUnits_SmallestImportantDiff <- SmallestImportantDiff*Criterion_SD

The smallest important difference in raw units is 0.12. This can be interpreted using Hopkins’s modified Cohen scale of:

  • < 0.2 = Trivial
  • 0.2 to 0.6 = Small
  • 0.6 to 1.2 = Moderate
  • 1.2 to 2.0 = Large
  • > 2.0 = Very large

Below is the code for calculating the typical error of the estimate (TEE) in raw units and standardised. The standardisation is the typical error divided by the SD of the criterion.

# Extract the typical error of the estimate - raw units
TEE_Raw <- summary(PeakVelocityLM)$sigma
# Standardised
TEE_Standardised <- TEE_Raw/Criterion_SD
# Calculate the pearson correlation
PearsonCorrelation <- cor(ValidityDataset$Criterion, ValidityDataset$Practical, method="pearson")
# Calculate Bland-Altman 95% LOA
BlandAltman <- SD_PracMinusCrit*1.96

The TEE in raw units is 0.60 whilst the standardised value is 0.97. The Pearson correlation is calculated at 0.28.

The Bland-Altman ± 95% limits of agreement (LOA) are also given, at 2.20. A proposed practical application of this is to subtract the bias from the practical. If the resulting value is outside the ± limits then you consider that there is a significant difference. However, this interpretation does not take into account the smallest important difference.

Now to log transform values, so we can obtain means as a percentage. The following code allows for the back-transformed mean and the SD as a coefficient of variation (CV %).

# Log transform the values
ValidityDataset$Criterion_LogTransform <- log(ValidityDataset$Criterion)*100
ValidityDataset$Practical_LogTransform <- log(ValidityDataset$Practical)*100
# Back transform the mean - Criterion
Criterion_Mean_BackTransform <- exp(mean(ValidityDataset$Criterion_LogTransform)/100)
# SD as a factor - Criterion
Criterion_SD_Factor <- exp(sd(ValidityDataset$Criterion_LogTransform)/100)
# SD as a CV (%) - Criterion
Criterion_SD_CV <- (Criterion_SD_Factor - 1)*100
# Back transform the mean - Practical
Practical_Mean_BackTransform <- exp(mean(ValidityDataset$Practical_LogTransform)/100)
# SD as a factor - Practical
Practical_SD_Factor <- exp(sd(ValidityDataset$Practical_LogTransform)/100)
# SD as a CV (%) - Practical
Practical_SD_CV <- (Practical_SD_Factor - 1)*100

The SD as a CV is 11.5%.

We will need to run another linear regression, using the log-transformed values.

# Run a linear model of the log transformed data
LogModel_Validity <- lm(Criterion_LogTransform ~ Practical_LogTransform, data = ValidityDataset)
# Extract the coefficients
Intercept_CriterionLogTransform <- LogModel_Validity$coef[1]
Intercept_CriterionLogTransform <- exp(Intercept_CriterionLogTransform/100)
Slope_PracticalLogTransform <- LogModel_Validity$coef[2]
# Extract the upper and lower CL's @ 90%
CI_CriterionLogTransform <- confint(LogModel_Validity, "(Intercept)", level = .90)
CI_CriterionLogTransform <- exp(CI_CriterionLogTransform/100)
CI_PracticalLogTransform <- confint(LogModel_Validity, "Practical_LogTransform", level = .90)
# Calculate predicted values
ValidityDataset$Predicted_LogTransform <- 100*log(Intercept_CriterionLogTransform)+Slope_PracticalLogTransform*ValidityDataset$Practical_LogTransform
# Calculate the residuals
ValidityDataset$Residual_LogTransform <- ValidityDataset$Criterion_LogTransform-ValidityDataset$Predicted_LogTransform
ValidityDataset$PracMinusCrit_LogTransform <- ValidityDataset$Practical_LogTransform-ValidityDataset$Criterion_LogTransform
# Calculate the mean bias in raw units
MeanBias_RawUnits_LogTransform <- 100*(exp(mean(ValidityDataset$PracMinusCrit_LogTransform)/100)-1)
# Mean bias as a %
MeanBias_Percentage_LogTransform <- 100*exp(MeanBias_RawUnits_LogTransform/100)-100
# Mean bias as a factor
MeanBias_Factor_LogTransform <- MeanBias_Percentage_LogTransform/100 + 1
# Mean bias as standardized
MeanBias_Standardized_LogTransform <- log(1+MeanBias_RawUnits_LogTransform/100)/log(1+Criterion_SD_CV/100)

The output of the overall mean bias is 6.5%. Now, extract the TEE as a CV (%) and the resulting Pearson correlation.

# Extract the typical error of the estimate - raw units
TEE_CV <- summary(LogModel_Validity)$sigma
TEE_CVPercentage <- 100*(exp(TEE_CV/100)-1)
TEE_Factor <- exp(TEE_CV/100)
TEE_Standardised <- log(TEE_CVPercentage/100+1)/log(Criterion_SD_CV/100+1)
# Calculate the pearson correlation
PearsonCorrelation_LogTransform <- cor(ValidityDataset$Criterion_LogTransform,

The TEE as a CV is 11.1% and when standardised is 0.97, which can be interpreted as large on the Hopkins modified Cohen scale. A Pearson correlation of 0.30 is also achieved.

After all our coding, plotting and analysis, would you use the GPS system to measure peak velocity?