Sign In for Full Access

Quick access through the institutional single sign-on Manchester Met Sign In
Skip this for now
|
Public Access Here

Sign In for Free Access

Login with email for free guest access to a range of Rise content
Go!
Logging You In!
Incorrect Password (Click Here to Reset)! Passwords Must Match Password must be more than 8 characters
Skip this for now
|
Man Met Access Here
menu

Data Cleaning

The first important stage when analysing data is to perform data cleaning on the variables. This is essentially removing errors and outliers and tidying up the data. There are a number of reasons for doing this, such as:

  • In analysis you want to run the strongest tests possible, so cleaning the data to remove outliers might make there a better chance of achieving normality, and thus allowing the potentially stronger test to be run 
  • To ensure the data is appropriate for conducting specific tests 
  • To ensure no loss of statistical power through outliers 
  • To remove any errors on the dataset (all datasets will have some degree of errors from the inputting stage making it easy to make mistakes)
  • To promote the most accurate and reliable results possible from the data testing 

Firstly, run frequency tables on the two scale variables to identify any outliers or errors. 

The life satisfaction variable is fine as it only contains the values 0-10, which reflects the likert scale of that variable.

However, the relationship satisfaction variable has values 98 and 99 which are missing answers or potential errors. These need to be put as missing values. 

Checking for the outliers in data view:

Click on the data view and find the relationship satisfaction variable. Right click on the name and click sort-descending. 

The outliers 99 and 98 values appear at the top. 

Inserting missing values in variable view:

We want to go back to variable view and go to the ‘Missing’ column. Click on the variable you want to enter missing values for and select discrete missing values.

Insert 98 and 99 and select OK.

Check with frequency table 

TASK: Have a go at cleaning the data on the opinions and lifestyle survey, write down the variable name that needs cleaning, what needs cleaning from the variable and present before and after screenshots of the cleaned variable.