Scale Development Seminar – Lucy Busija
- Why measurement is important
o Examples
§ Exams
§ Consumer research
§ Performance appraisal
§ Myer Briggs score
§ [GW – get Anna to do thi]
- Reflection of underlying condition
o Eg, tone of voice as one indicator of suicidal tendency
- Questions can bias response
o Qualifiers à can "force" an answer
o Double barrelled
o Interpretation
o Graduated scale – scale needs sufficient granularity to capture detail of respondent's position
o Reactivity à eg questions on meaning of life
§ Can prompt reflection and therefore change perception
- Holmes – Rahne social readjustment scale
o Does giving scale of impact influence response
o Good vs bad stress à lumps together
- Reversing direction of questions
o Traditional wisdom
§ Make people stop and think
o Now
§ People do genuinely differ in how they respond to positively worded questions vs negatively worded questions.
- Issue of choosing sides vs fence sitting
- Theories of measurement
o Classical
o Generalisability
o Item Response
- Classical
o All error is random
o Cronberg's Alpha à does CA relate to Generalisability Theory
o Systematic error à criticism of Classical Theory
o Systematic error doesn't cancel out
Random error
TRUE SCORE observed score

Systematic error
Response to shortcomings of Classical Theory is Generalisability Theory
[GW : look at missing data classification classification]
- Intervention
o Placebo effect
o Reactivity
ANOVA model à sources of variation
IRT
[GW: need to get some understanding on this]
- Range of difficulty
o From easy
o To hard
Cross cultural differences
- What makes a good scale
- Setting / context / use
- Purpose / context
- Population
- No absolute measure of reliability / validity
can only say a scale is good / effective in context of purpose of scale
ruler analogy
o measure
o weigh
- Sensitivity / responsiveness à can a test distinguish between 2 levels of a characteristics
o Ie high vs low self esteem
- Effect size of intervention
- Reliability
- Consistency / robust
- Amount of error in test score
- Variability
- Types
o Test / retest
o Equivalent forms
o Split half
o Inter rater agreement
o Internal consistency à Cronberg's alpha – what is the math
- Volatile characteristics à mood / anxiety
- Don't want real change to occur between test / retest
- Memory
- Difficulty
- Intangible measure
- Pearsons
- what happens à all score 10% higher in retest?
- systematic change à Pearsons would not have detected this à what is the test that would have detected this change?
-
- why does this impact reliability
- would not normally administer test twice
- like treatment + placebo affect vs just placebo
- ICC
- agreement coefficient
- split half reliability
- Spearman Brown correlation
- why are long tests more reliable
- [GW à why not do multiple splits with resampling]
- Internal Consistency
- Cronbergs alpha
- how does this differ from previous
- longer tests à duplicate or redundant questions à artificially increase CA
- complexity of concept
o fatigue – not very complex
o depression – very complex
- not suitable for questionnaires with internal order like
o from easy to hard questions
- CA à percentage of true score out of total score
- FDA guidelines
- nature of concept à diffuse / intangible
- reliability à absence of variance
- item – total correlation
o partial auto correlation
o item – remainder correlation
- squared multiple correlation
- validity
- conceptual properties of scale
- valid scale must be reliable
- does a scale always face validity
- content validity
o eg à depression
o scale should cover all aspects of depression
§ loss of sleep
§ loss of apetite
§ etc
- example à walking up stairs à look at this example again
o how often
o comfort / discomfort
- inkblot test à how do you test that for reliability and validity
- constructive / discriminant comparisons
- where is the boundary of a measure.
- Area under ROC à data prediction
- prepare responses , then write question
- scale development
- de Villis & Nunnally JC 1994 Psychometry
- writing skill / editing
- IRT
- some items measure low ability / some items measure high ability
o time mgt example à they all measure average time mgt ability
o if all scores have mean around 5, best able to score middle of the range time mgt skills
- a clinical measure that targets high levels of depression – may not work well with low level depression.
- how does variance relate to min / max / mean à investigate
o same units ?
- what is the problem if scores bunch up at one end of scale à investigate
- what type of scale works with C alpha
o 10 item scale
o 3 item scale
o 2 item scale
- does amalgamating various skewed distributions result in a different distribution
- what is distribution of combination à investigate
- ability
- perception of ability
- perception can change without ability changing
-
- reactivity effect
- need to redo interclass correlation
- why ANOVA
- different from correlation
- when to use
- no – group validity hypothesis
No comments:
Post a Comment