In this notebook, we will be dealing with loading the dataframe. Datasets are usually stored in csv files, and it can be loaded to R by using the following code:
read.csv("Path where your CSV file is located on your computer\\File Name.csv")
We will be loading the NHANES.csv file, which is located in the Data
folder:
df <- read.csv('Data//NHANES.csv')
df
ID <int> | SurveyYr <fct> | Gender <fct> | Age <int> | AgeDecade <fct> | AgeMonths <int> | Race1 <fct> | Race3 <fct> | Education <fct> | |
---|---|---|---|---|---|---|---|---|---|
51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
51625 | 2009_10 | male | 4 | 0-9 | 49 | Other | |||
51630 | 2009_10 | female | 49 | 40-49 | 596 | White | Some College | ||
51638 | 2009_10 | male | 9 | 0-9 | 115 | White | |||
51646 | 2009_10 | male | 8 | 0-9 | 101 | White | |||
51647 | 2009_10 | female | 45 | 40-49 | 541 | White | College Grad | ||
51647 | 2009_10 | female | 45 | 40-49 | 541 | White | College Grad | ||
51647 | 2009_10 | female | 45 | 40-49 | 541 | White | College Grad |
The NHANES dataframe comes from the R package NHANES
:https://cran.r-project.org/web/packages/NHANES/NHANES.pdf
Since the dataframe is long, we often use head()
to get the first 6 rows:
head(df)
ID <int> | SurveyYr <fct> | Gender <fct> | Age <int> | AgeDecade <fct> | AgeMonths <int> | Race1 <fct> | Race3 <fct> | Education <fct> | ||
---|---|---|---|---|---|---|---|---|---|---|
1 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
2 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
3 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
4 | 51625 | 2009_10 | male | 4 | 0-9 | 49 | Other | |||
5 | 51630 | 2009_10 | female | 49 | 40-49 | 596 | White | Some College | ||
6 | 51638 | 2009_10 | male | 9 | 0-9 | 115 | White |
We can use:
dataframe_name$column_name
to access the column in the dataframe. For example:
head(df$Age)
## [1] 34 34 34 4 49 9
We can also change the column name by running:
dataframe_name$newColumn_name <- dataframe$column_name
df$number <- df$ID
head(df$number)
## [1] 51624 51624 51624 51625 51630 51638
Frequency tables can easily be obtained in R programming by using the following code:
table(dataframe$column)
For example:
table(df$Gender)
##
## female male
## 5020 4980
Sometimes we are interested in getting a subset of the dataframe based on the column value. For example, instead of analyzing everyone in the NHANES dataset, we might be interested in analyzing people who are older than 18 years of age. In these cases, we can select a subset of the dataframe by using the following code:
df <- df[conditional_statement,]
where df
is the name of the dataframe. For example:
# Get people who are older than 18 years of age
df <- df[df$Age >= 18,]
head(df)
ID <int> | SurveyYr <fct> | Gender <fct> | Age <int> | AgeDecade <fct> | AgeMonths <int> | Race1 <fct> | Race3 <fct> | Education <fct> | ||
---|---|---|---|---|---|---|---|---|---|---|
1 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
2 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
3 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
5 | 51630 | 2009_10 | female | 49 | 40-49 | 596 | White | Some College | ||
8 | 51647 | 2009_10 | female | 45 | 40-49 | 541 | White | College Grad | ||
9 | 51647 | 2009_10 | female | 45 | 40-49 | 541 | White | College Grad |
# Get male
df <- df[df$Gender == 'male',]
head(df)
ID <int> | SurveyYr <fct> | Gender <fct> | Age <int> | AgeDecade <fct> | AgeMonths <int> | Race1 <fct> | Race3 <fct> | Education <fct> | ||
---|---|---|---|---|---|---|---|---|---|---|
1 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
2 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
3 | 51624 | 2009_10 | male | 34 | 30-39 | 409 | White | High School | ||
11 | 51654 | 2009_10 | male | 66 | 60-69 | 795 | White | Some College | ||
12 | 51656 | 2009_10 | male | 58 | 50-59 | 707 | White | College Grad | ||
13 | 51657 | 2009_10 | male | 54 | 50-59 | 654 | White | 9 - 11th Grade |
©2021 by Daiki Tagami. All rights reserved.