**Data Recoding **

Depending on the hypothesis we generate and the variables we wish to use for a specific test we may need to conduct some further cleaning. When conducting a Tests for Difference the dependent variable must be measured at the interval level (scale/ordinal) and the independent variable must be measured at the categorical level.

But not all tests are created equally, whilst running an ANOVA your IV could have multiple categories and encompass a variable like religion, but if you intended to run a t-test your IV has to be dichotomous, otherwise known as only having two categories.

Before deciding to recode a variable, you must consider:

- Which test you plan on conducting and the number of categories required for said test i.e 2 groups =t-test/Mann-Whitney 2+ Groups ANOVA/Kruskal-Wallis.
- More categories=Greater potential loss of statistical power, especially if the categories feature small numbers.
- If small categories remain you could potentially fail to achieve sufficient statistical power.
- You need to weigh the need for statistical power and theoretical need, i.e it may be worthwhile and make theoretical sense to keep ethnic categories distinct, but you need to consider whether all categories have enough numbers.
- You may want to merge categories. BAME/BME groups are often grouped together in many surveys but you must ensure for your analysis if such decisions make theoretical sense for you.
- As stated previously, don’t remove unhelpful answers like don’t know automatically, check the numbers within this category first, if they’re high this is potentially a meaningful response.

Upon considering all of these issues closely, you can proceed with preparing your data using the process of Recoding.

To demonstrate how to recode a variable we will use the variable ‘Employment’ recoding it to become dichotomous and applicable for a t-test.

**Recoding a Variable **

Below is a frequency table for the variable ‘Employment’, we intend to conduct a t-test so therefore must recode the variable to become dichotomous, converting this variable from having 3 categories to two.

A further rationale to possibly recode this variable is that the category ILO unemployed has only 76 within it, a number that is unlikely to provide sufficient statistical power, therefore it could make theoretical sense to combine the economically inactive category and ILO unemployed to become ‘not working’ and have two categories ‘in work’ and ‘not working’.

We want to convert the variable Employment as follows and create a new variable Employment2 in the process.

Old Values | New Value |

Economically inactive | 1 |

In employment | 2 |

ILO unemployed | 1 |

We must first tell R Studio we wish to create a new variable. We would use the commands.

opintfd$Employment2[opintfd$Employment==”Economically inactive”]=”1″

opintfd$Employment2[opintfd$Employment==”ILO Unemployed”]= “1”

opintfd$Employment2[opintfd$Employment==”In Employment”]= “2”

View(opintfd)

The screenshot above shows the commands having been successfully inputted into R studio. Once done the command view (opintfd) will bring up the dataset in the top left, showing all the cases for the opintfd dataset and documenting the newly created Employment2 variable. Your screen should look like the screenshot below.

We can also run a quick frequency table to confirm our variable has been recoded successfully. Looking at the screenshot below we can see the economically inactive category and ILO unemployed have come together to make up the category/ value “1” and the “2” value now represents all those in work.

The only thing left to do is tell R Studio what the new values for Employment2 represent. To reiterate the value 1 is meant to represent the newly recoded ‘not working’ category and the value 2 representing the ‘in work’ group of respondents.

We would use the following commands:

**opintfd$Employment2[opintfd$Employment2==”1″]= “not working”**

**opintfd$Employment2[opintfd$Employment2==”2″]= “in work”**

Running another frequency table for the variable employment2 will help you ascertain whether the steps to assign the new values has been successfully. As seen in the screenshot below we’ve successfully recoded the variable Employment to become dichotomous and applicable for running a t-test.

Remember you should only recode variables if it makes theoretical and statistical sense to do so.

##### Apply Your Thinking:

Recode your own variable

Using the example above, have a go at following the steps to recode the variable ‘Education’

Activity: If you’ve not been doing so, have a go at following alone these steps above of how to recode the variable Employment.

Also, check out the how-to video below for further help on recoding variables…