You’ve already learnt how to do t-test and Mann-Whitney, but what happens when you want to use an IV with more than two groups?
We need some more groups in here…
t-test and Mann-Whitney are only useful when comparing variables with two groups, but many research questions require us to examine multiple groups (2+) at a time, when we want to have more than 2, we employ different statistical tests. When comparing 2+ groups the parametric test is ANOVA and non-parametric is a Kruskal-Wallis test.
As a reminder the three parametric assumptions are:
1. The dependent variable must be measured at the Interval level (Scale/Ordinal) and the
independent variables must be measured at the Categorical level.
2. Be normally distributed
3. Be of equal variance (homogeneity)
This has all already been outlined in a previous intensive, so we won’t repeat these steps here. If you need to remind yourself of how to conduct parametric assumption testing, go back to the previous intensive in this course.
Testing for Difference: Comparing 2+ Groups
Suppose we wanted to explore the factors that are linked to British people’s life satisfaction, in our dataset we have numerous different variables which we could test (for difference) against the scale DV of Life Sat. We want to solely focus on the variables which have more than 2 groups so: Education, Employment and Age are our only possible IVs.
Research Question: Do British people’s level of life satisfaction differ by age?
Null Hypothesis: British people’s level of life satisfaction does not differ by age
Research Hypothesis: British people’s level of life satisfaction does differ by age
Variables and level of measurement
DV (Interval Scale): Life Satisfaction measured on a scale.
IV (Categorical Ordinal): Age measured across 4 groups.
The data was examined first to see if it met parametric assumptions, again these steps won’t be conducted here as they’ve already been outlined. The data was found to meet parametric assumptions thus an ANOVA was selected for further analysis.
Use ANOVA ONLY if data fulfils the conditions for both normality and homogeneity of variance. If one or more parametric assumptions is violated use Kruskal-Wallis.
Running an ANOVA in R Studio
Input the following command into your script to conduct an ANOVA:
AnovaOpin<-aov(opintfd$`Life Sat`~ opintfd$Age)
This command will conduct the ANOVA but you need to input another command for the results to be summarised. Input the following command to see the results of our ANOVA.
Interpreting ANOVA results
If all of these commands have been inputted correctly a summary of the ANOVA should appear in the console (this is found in the screenshot below)
First, we need to look at the Pr(>F) figure, this is our sig value which we need to compare with the threshold value of 0.05. If the significance is less than 0.05 we reject our null hypothesis. And this would mean the findings of our study provide evidence to suggest that British people’s level of life satisfaction differs by age. We can see that our sig. value is less than 0.05 we would therefore reject the null hypothesis, this suggests that life sat does differ by age.
The statistical significance of the ANOVA test tells us the level of life sat significantly differs by age group. However, the test conducted does not tell us which groups differ from which. It’s therefore necessary after conducting an ANOVA and finding significance to carry out further analysis to find out which groups differ from which.
Conducting post hoc tests.
So, ANOVA tests the null hypothesis thus the resulting p-value only concludes whether or not there is a significant difference amongst the groups of the IV. Post Hoc tests allow us to confirm where those differences lie. We will use the Tukey command
All of the significance values (p adj) for each group need to be examined (25-44-16-24, 45-64-16-24,65+-16-24,45-64-25-44,65+-25-44,65+-45-64) to find out whether there are any values less than 0.05. Reviewing the screenshot above we can see amongst all the age group comparisons only the age groups 45-64-25-44 are significant at the 0.05 level.
As we used a two-tailed hypothesis the null hypothesis would be rejected. But as can be seen from the post-hoc tests only one combination between the groups is significant
25-44 and 45-64- significance p<.007.
We would report all of these findings as the following:
An independent one-way ANOVA found a significant different between life satisfaction and age group (F=4.171, df=3, p=.006). We therefore reject the null hypothesis. This study provides evidence to suggest that British people’s life satisfaction differs by age group.
Further analysis by post hoc tests found a statistically significant difference (p=.008) in levels of life satisfaction between the 25-44 and 45-64 age groups.
There were no significant differences found in levels of life satisfaction between any of the other age groups.
Running a Non-Parametric Kruskal-Wallis test
Research Question: Do British people’s level of life satisfaction differ by employment status?
Null Hypothesis: British people’s level of life satisfaction does not differ by employment status.
Research hypothesis: British people’s level of life satisfaction does differ by employment status.
Variables and level of measurement:
DV (Interval Scale): Life Satisfaction measured on a scale.
IV (Categorical Nominal): Employment status with 3 categories.
The data was examined first to see if it met parametric assumptions, again these steps won’t be conducted here as they’ve already been outlined.
The data was tested prior to further analysis and was found to violate parametric assumption for homogeneity, data was found to be heterogenous, so a non-parametric test Kruskal-Wallis was selected for further analysis.
Running a Kruskal-Wallis Test in R Studio
Input the following command into your script to conduct a Kruskal Wallis test:
kruskal.test(opintfd$’Life Sat’ ~ opintfd$Employment)
The screenshot above shows the results of our Kruskal-Wallis test. It shows the significance value for the test here. This study provides evidence to suggest that British people’s life satisfaction differs by employment status.
We can do some further analysis by observing the means for all the cases/groups within the employment variable, in an attempt to discern which category is causing this significant difference.
Input the commands:
mean(na.omit(opintfd$`Life Sat`[opintfd$Employment==”Economically inactive”]))
mean(na.omit(opintfd$`Life Sat`[opintfd$Employment==”ILO Unemployed”]))
mean(na.omit(opintfd$`Life Sat`[opintfd$Employment==”In Employment”]))
Below is a screenshot documenting all the different means for each group.
Just observing these values, we can see that the ILO Unemployed group have a significantly lower mean than the other groups. Therefore, we can say that ILO unemployed have a statistically significant lower level of happiness compared to the other employment statuses.
Also, we can utilise several different commands to father visualise these findings.
mytable2<-matrix(c(7.8,7.6,7.1),ncol= 3,byrow = TRUE)
colnames(mytable2)<- c(“In Employment”,”Economically inactive”,”ILO Unemployed”)
These commands create a table like the one shown below.
Then using the bar plot command we can create a bar chart barplot(mytable2)
This command produces a bar chart for the table that was created, again this acts as another way to visually show the difference in means across the 3 different groups.
The results of the Kruskal-Wallis test (H=10.995, df=2, p=.004) found a statistically significant difference between British people’s level of life satisfaction and their employment status. We, therefore, reject the null hypothesis.
Further analysis producing means for each category of the variable found significant differences in British people’s level of life satisfaction across 1 employment status. A significant difference was found in level of life satisfaction between the ILO unemployed group and the other two employment groups.
There was no statistically significant difference in levels of life satisfaction found between those in employment and the economically inactive.