This notebook will be dealing with computing summary statistics in R. We will be using the height variable in the NHANES dataset to deal with this:

df <- read.csv('Data/NHANES.csv')

1. Mean

Mean can be simply computed by using:

mean(data, na.rm=TRUE)
mean(df$Height, na.rm = TRUE)
## [1] 161.8778

We get NA when there are missing values inside the dataset, so we need to remove them inside the function.

When we don’t remove the missing values:

mean(df$Height)
## [1] NA

2. Median

Median can be determined by using:

median(data, na.rm=TRUE)
median(df$Height, na.rm=TRUE)
## [1] 166

3. Variance

var(data, na.rm=TRUE)
var(df$Height, na.rm=TRUE)
## [1] 407.4975

4. Standard deviation

sd(data, na.rm=TRUE)
sd(df$Height, na.rm=TRUE)
## [1] 20.18657

This is the same as the square root of variance:

sqrt(var(df$Height, na.rm=TRUE))
## [1] 20.18657

5. Maximum

max(data, na.rm=TRUE)
max(df$Height, na.rm=TRUE)
## [1] 200.4

6. Minimum

min(data, na.rm=TRUE)
min(df$Height, na.rm=TRUE)
## [1] 83.6

7. Median

median(data, na.rm=TRUE)
median(df$Height, na.rm=TRUE)
## [1] 166

8. Interquartile range

IQR(data, na.rm=TRUE)
IQR(df$Height, na.rm=TRUE)
## [1] 17.7

©2021 by Daiki Tagami. All rights reserved.