Depending on the hypothesis we generate and the variables we wish to use for a specific test we may need to conduct some further cleaning. When conducting a Tests for Difference the dependent variable must be measured at the interval level (scale/ordinal) and the independent variable must be measured at the categorical level.
But not all tests are created equally, whilst running an ANOVA your IV could have multiple categories and encompass a variable like religion, but if you intended to run a t-test your IV has to be dichotomous, otherwise known as only having two categories.
Before deciding to recode a variable, you must consider:
- Which test you plan on conducting and the number of categories required for said test i.e 2 groups =t-test/Mann-Whitney 2+ Groups ANOVA/Kruskal-Wallis.
- More categories=Greater potential loss of statistical power, especially if the categories feature small numbers.
- If small categories remain you could potentially fail to achieve sufficient statistical power.
- You need to weigh the need for statistical power and theoretical need, i.e it may be worthwhile and make theoretical sense to keep ethnic categories distinct, but you need to consider whether all categories have enough numbers.
- You may want to merge categories. BAME/BME groups are often grouped together in many surveys but you must ensure for your analysis if such decisions make theoretical sense for you.
- As stated previously, don’t remove unhelpful answers like don’t know automatically, check the numbers within this category first, if they’re high this is potentially a meaningful response.
Upon considering all of these issues closely, you can proceed with preparing your data using the process of Recoding.
To demonstrate how to recode a variable we will use the variable ‘Employment’ recoding it to become dichotomous and applicable for a t-test.
Recoding a Variable
Below is a frequency table for the variable ‘Employment’, we intend to conduct a t-test so therefore must recode the variable to become dichotomous, converting this variable from having 3 categories to two.
A further rationale to recode this variable is that the category ILO unemployed has only 76 within it, a number that is unlikely to provide sufficient statistical power, therefore it makes theoretical sense to combine the economically inactive category and ILO unemployed to become ‘not working’ and have two categories ‘in work’ and ‘not working’.
We need to consult the codebook to make a note of our old and new values.
|Old Values||New Value|
Go to Transform-Recode into Different Variables.
Insert the variable you wish to recode. You need to create a new name for your variable; typically this would just be the original name with 2 at the end.
Insert the variable label, you can do this simply by copying and pasting the label from the variable view. Then click change.
If done correctly your new variable name will appear.
Now click Old and New Values. Use the table above to insert the old values and the new values as shown in the screenshot below.
Repeat the process for the other two values. Remember the old values (1 and 3) are merging to create a new value (1) which represents ‘not working’.
When done your screen should look something like the one above. Click continue.
And then OK.
If everything has been done correctly your new variable will appear at the bottom of the variable view. You’re nearly done! The only thing left to do is tell SPSS what the new values represent.
To insert the new variables values right click here. A value labels box will appear, insert the new value and new label as shown in the screenshot below and press add and OK.
To check our work, we can run a frequency table for the new employment2 variable.
We’ve successfully recoded.
Remember you should only recode variables if it makes theoretical and statistical sense to do so.
Apply Your Thinking:
Have a go at recoding some variables with your dataset
Try recoding the variable education
Here’s a how-to video on the steps we have just covered, check it out!