Monday, March 28, 2011

SG: Topic 5 : Coding and Cleaning Survey Data


 

Coding Open Ended Questions

-          classifying answers / converting to numbers

-          open ended

o   attribute information – range of answers is too large

o   attitudinal – response options are unknown / feedback is required.

o   general feelings

o   reasons for opinions

-          other category

-          tradeoff : more codes more detail / less codes easier the analysis

-          codes

o   pre-existing

§  systematic / developed by experts

§  publically available / coding transparent

§  same coding for repeated surveys

§  facilitate comparisons

o   developed based on responses

§  read selection of responses

§  summarise responses into themes

§  if required – group themes into broad topics

§  generate frequency distribution for each theme

 

Thematic Coding

 

 

 

Coding Missing Data

 

-          different from valid code

-          reasons

o   not required to answer

o   not ascertained

o   refused to answer

o   did not know answer / no opinion

 

Checking for Coding Error

 

-          sources of error

o   data entered in wrong column

o   miscoding

§  data collection

§  manual coding

§  data entry

-          methods for checking coding errors

o   valid range checks

o   filter checks

o   logical checks

 

 

Preparing variables For Analysis

 

-          Changing categories

o   initial coding results in more categories than we require

§  recode occupational categories into white / blue collar

o   too few subjects in some categories

o   collapsing categories can highlight patterns in data (but can also mask a relationship)

o   approaches

§  substantive

·         combining categories that have something in common

o   industry based categories

o   amount of training

·         divide categories of variables into equal lots  [gw ?]

§  distributional

·         restricted to ordinal and interval variables

·         divide sample into roughly equal sized groups of cases

-          rearranging categories

o   arrange categories in more logical order

§  more appropriate to focus of analysis

§  tables easier to read

§  changing level of measurement of variable and thus affecting the methods of analysis that can be applied to variable

o   example

§  organize industry categories according to level of unionization

-          reverse coding

o   when constructing scales

o   change direct of scale to be consistent

 

 

Creating New Variables

 

-          create new variables

o   developing scales

o   conditional transformations

§  eg, marital history of both husband and wife

o   arithmetic transformations

§  age difference between husband and wide

 

Standardising Variables

 

-          interested in scores relative to other people in sample

-          comparable studies where units of measure are not comparable (eg, income)  ??

-          remove inflation

-          interval level  à z scores

-          ordinal level à percentiles

 

 

Dealing with missing data

 

-          checking for missing data bias

o   divide sample into 2 groups based on whether particular variable is missing data or not

o   cross tab

-          methods for dealing with missing data

o   deleting either cases or variables

§  list wise deletion

·         any case with missing data deleted

·         issues

o   loss of data / reduction in sample size

§  pair wise deletion

·         use only cases with complete data for each calculation

§  deletion of variable

o   statistical imputation

§  sample means

·         value of mean of that variable

§  group means

·         divide sample into groups on background variable

·         issue

o   exaggerates extent to which people in a group are similar

o   inflates correlation between variables

§  random assignment within groups

·         divide sample into groups on background variable

·         replace missing value with value of same variable of nearest preceeding case

·         maintains variability

§  regression analysis

 

 

 

 

 

 

 

Gw comment

 

-          is a relationship being masked as a result of collapsing same / similar to Simpsons law

 

No comments: