This notebook will be dealing with computing summary statistics in R. We will be using the height variable in the NHANES dataset to deal with this:
df <- read.csv('Data/NHANES.csv')
Mean can be simply computed by using:
mean(data, na.rm=TRUE)
mean(df$Height, na.rm = TRUE)
## [1] 161.8778
We get NA when there are missing values inside the dataset, so we need to remove them inside the function.
When we don’t remove the missing values:
mean(df$Height)
## [1] NA
Median can be determined by using:
median(data, na.rm=TRUE)
median(df$Height, na.rm=TRUE)
## [1] 166
var(data, na.rm=TRUE)
var(df$Height, na.rm=TRUE)
## [1] 407.4975
sd(data, na.rm=TRUE)
sd(df$Height, na.rm=TRUE)
## [1] 20.18657
This is the same as the square root of variance:
sqrt(var(df$Height, na.rm=TRUE))
## [1] 20.18657
max(data, na.rm=TRUE)
max(df$Height, na.rm=TRUE)
## [1] 200.4
min(data, na.rm=TRUE)
min(df$Height, na.rm=TRUE)
## [1] 83.6
median(data, na.rm=TRUE)
median(df$Height, na.rm=TRUE)
## [1] 166
IQR(data, na.rm=TRUE)
IQR(df$Height, na.rm=TRUE)
## [1] 17.7
©2021 by Daiki Tagami. All rights reserved.