We use chi-squared test of independence to compare the proportion. Here, we have:
The code to conduct chi-squared test will look like:
chisq.test(data1, data2)
where data1
and data2
are columns from the same dataframe that contains the different data.
Here, we will be using NHANES dataset to find out the association between income and education levels:
df <- read.csv('Data/NHANES.csv')
chisq.test(df$HHIncome,df$Education)
##
## Pearson's Chi-squared test
##
## data: df$HHIncome and df$Education
## X-squared = 1576.8, df = 60, p-value < 2.2e-16
Since the p-value is less than 0.05, we can conclude that there is an association between hosuehold income and education levels.
We can create a contingency table by using the following code:
table(data1, data2)
table(df$HHIncome, df$Education)
##
## 8th Grade 9 - 11th Grade College Grad High School
## 212 69 104 117 125
## 0-4999 70 20 30 17 26
## 5000-9999 86 27 43 10 47
## 10000-14999 144 41 86 21 130
## 15000-19999 152 56 87 43 95
## 20000-24999 221 59 77 35 100
## 25000-34999 264 69 117 107 167
## 35000-44999 230 40 79 110 195
## 45000-54999 218 18 58 135 140
## 55000-64999 157 18 49 126 122
## 65000-74999 120 11 42 138 71
## 75000-99999 301 13 58 332 108
## more 99999 604 10 58 907 191
##
## Some College
## 184
## 0-4999 29
## 5000-9999 41
## 10000-14999 121
## 15000-19999 94
## 20000-24999 125
## 25000-34999 234
## 35000-44999 209
## 45000-54999 215
## 55000-64999 149
## 65000-74999 144
## 75000-99999 272
## more 99999 450
This will summarize the numerical relationship between two variables.
©2021 by Daiki Tagami. All rights reserved.