**Accessing the data **

For this sprint we’ll be continuing to use a cutdown dataset from the Opinions and Lifestyle Survey: Well-Being Module, 2015. Accessing the data is easily done by simply running the R studio software, as the ‘opintfd’ dataset is likely to have been the last ran it’ll be automatically loaded upon using R studio however you need to make sure you re-import the dataset finding it on your computer and loading it again.

**Step 2 – Data Preparation **

Data preparation for Tests for Difference was explored in a previous intensive so we won’t repeat these steps here. However, it’s worth remembering that you need to always:

- Explore data through descriptive statistics
- Create hypotheses
- Identify levels of measurement for variables
- Clean your data.
- Recode variables where necessary.
- Identify most suitable test for measurement type & hypotheses
- Test for Sampling Error and confidence.

Again, this has all been outlined in previous guides.

**Step 3- Testing for Parametric Assumptions in Tests for Difference**

Before running any parametric tests in tests for difference, your data must meet three different conditions. If data fails to meet parametric assumptions and therefore violates these assumptions a non-parametric test must be selected instead. Ideally a parametric test would be used as they’re stronger, more robust and any results carry more weight than a non-parametric test. But if data does end up violating one or more assumptions a non-parametric test will simply be chosen instead.

When comparing 2 groups for tests for different the Parametric test is a t-test and the Non-parametric a Mann-Whitney test.

Your data must meet three conditions:

- The dependant variable must be measured at the Interval level (Scale) and the independent variables must be measured at the categorical level.

In the following example we will be testing the RQ: *What are the key factors that affect life satisfaction within Britain? *

Our DV is the LifeSat variable and measured on a scale which means it’s measured at the interval level.

We can use the variables Gender or Ethnicity as IVs to tests for difference **when comparing two groups.** (These are the only two dichotomous variables within our dataset).

We have therefore already met the first condition and have met one parametric assumption. This first assumption cannot be tested, its dependent upon you knowing which type of variable is which (IV/DV) and understanding the different measures.

The second and third measures are tested using R studio commands and will be outlined below.

- Data must be normally distributed.
- Be of equal variance (homogeneity)