Monday, May 23, 2011

Two Way Contingency Tables – Chapter 2 : An Introduction to Categorical Data Analysis – Alan Agresti


 

-          Association between two categorical  variables

-          In many contingency tables, one variable is a response variable and the other an explanatory variable.

o   Then informative to construct separate probaility distribution for Y at each level of x

§  Conditional probabilities for Y, given the level of X

·         Called : conditional distribution.

-          Independence

o   Statistically independent if conditional distributions of Y are identical at each level of X

§  Independent : probability of any particular column is same for each row.

§  Statistical independence: property that all joint probabilities equal the product of their marginal probabilities

·         Joint

·         Marginal

-          When rows of contengency table refer to different groups

o   Sample sizes for groups often fixed by sampling design

o   When marginal totals fixed rather than random

§  Joint distribution for x and y is no longer meaningful

·         But conditional distributions for Y at each level of X are

-          Difference of Proportions

o   Compares the success probabilities in the two rows

o   Difference falls between -1 and 1  / equals zero when two probabilities equal

o   See formula for SE   à calculate confidence interval.

-          Aspirin and heart attack example

o   Two rows à independent bimonial sample

-          Relative Risk

o   Difference between two proportions of certain fixed size may have greater importance when both proportions are near 0 or 1.

o   RR = ratio of success probabilities.

o   Any non-negative number

o   Complicated CI formula

-          Odds Ratio

o   Odds = prob of success / prob of failure

o   Odds non-negative

-          Odds Ratio for Aspirin Study

-          Relationship betwen odds ratio and relative risk

o   When proportion of successes is close to zero à fraction in last term is approx zero

o   OR and RR then take similiar values.

o   For some data sets calculation of RR is not possible

§  Case control study à where marginal distribution is fixed by sampling design

·         Two controls for each case

·         Might wish to compare ever smokers with non smokers in terms of proportions who suffered a disease

o   These proportions refer to conditional distribution of disease, given smoking status

§  Cannot estimate such proportions for this data set

§  Study matched each case with two controls

·         We can compute proportions in reverse direction

o   Conditional distribution of smoking status, given disease status

o   Use odds ratio

§  Odds ratio takes same value when it is defined using the condtional distribution of x given y as it does using the distribution of y given x  à treats variables symetrically

§  Or – conditional distribution in either direction

 

No comments: