In this notebook, we will be dealing with loading the dataframe. Datasets are usually stored in csv files, and it can be loaded to R by using the following code:

read.csv("Path where your CSV file is located on your computer\\File Name.csv")

We will be loading the NHANES.csv file, which is located in the Data folder:

df <- read.csv('Data//NHANES.csv')
df
ABCDEFGHIJ0123456789
ID
<int>
SurveyYr
<fct>
Gender
<fct>
Age
<int>
AgeDecade
<fct>
AgeMonths
<int>
Race1
<fct>
Race3
<fct>
Education
<fct>
516242009_10male3430-39409WhiteHigh School
516242009_10male3430-39409WhiteHigh School
516242009_10male3430-39409WhiteHigh School
516252009_10male40-949Other
516302009_10female4940-49596WhiteSome College
516382009_10male90-9115White
516462009_10male80-9101White
516472009_10female4540-49541WhiteCollege Grad
516472009_10female4540-49541WhiteCollege Grad
516472009_10female4540-49541WhiteCollege Grad

The NHANES dataframe comes from the R package NHANES:https://cran.r-project.org/web/packages/NHANES/NHANES.pdf

Since the dataframe is long, we often use head() to get the first 6 rows:

head(df)
ABCDEFGHIJ0123456789
 
 
ID
<int>
SurveyYr
<fct>
Gender
<fct>
Age
<int>
AgeDecade
<fct>
AgeMonths
<int>
Race1
<fct>
Race3
<fct>
Education
<fct>
1516242009_10male3430-39409WhiteHigh School
2516242009_10male3430-39409WhiteHigh School
3516242009_10male3430-39409WhiteHigh School
4516252009_10male40-949Other
5516302009_10female4940-49596WhiteSome College
6516382009_10male90-9115White

We can use:

dataframe_name$column_name

to access the column in the dataframe. For example:

head(df$Age)
## [1] 34 34 34  4 49  9

We can also change the column name by running:

dataframe_name$newColumn_name <- dataframe$column_name
df$number <- df$ID
head(df$number)
## [1] 51624 51624 51624 51625 51630 51638

Frequency table

Frequency tables can easily be obtained in R programming by using the following code:

table(dataframe$column)

For example:

table(df$Gender)
## 
## female   male 
##   5020   4980

Get data based on condition

Sometimes we are interested in getting a subset of the dataframe based on the column value. For example, instead of analyzing everyone in the NHANES dataset, we might be interested in analyzing people who are older than 18 years of age. In these cases, we can select a subset of the dataframe by using the following code:

df <- df[conditional_statement,]

where df is the name of the dataframe. For example:

# Get people who are older than 18 years of age
df <- df[df$Age >= 18,]
head(df)
ABCDEFGHIJ0123456789
 
 
ID
<int>
SurveyYr
<fct>
Gender
<fct>
Age
<int>
AgeDecade
<fct>
AgeMonths
<int>
Race1
<fct>
Race3
<fct>
Education
<fct>
1516242009_10male3430-39409WhiteHigh School
2516242009_10male3430-39409WhiteHigh School
3516242009_10male3430-39409WhiteHigh School
5516302009_10female4940-49596WhiteSome College
8516472009_10female4540-49541WhiteCollege Grad
9516472009_10female4540-49541WhiteCollege Grad
# Get male
df <- df[df$Gender == 'male',]
head(df)
ABCDEFGHIJ0123456789
 
 
ID
<int>
SurveyYr
<fct>
Gender
<fct>
Age
<int>
AgeDecade
<fct>
AgeMonths
<int>
Race1
<fct>
Race3
<fct>
Education
<fct>
1516242009_10male3430-39409WhiteHigh School
2516242009_10male3430-39409WhiteHigh School
3516242009_10male3430-39409WhiteHigh School
11516542009_10male6660-69795WhiteSome College
12516562009_10male5850-59707WhiteCollege Grad
13516572009_10male5450-59654White9 - 11th Grade

©2021 by Daiki Tagami. All rights reserved.