Saturday, March 12, 2011

Zuur : Chapter 3 : Accessing Variables and Managing Subsets of data


 

Accessing Variables from a Data Frame

 

-          read.table à produces data frame

-          use names command immediately after read.table à to see variables we are dealing with

-          str(structure)  à informs us of status of each variable in data frame

o   integer, numeric, etc

-          how to access variables

o   data argument in a function

§  eg, linear regression function

·         lm(GSI ~factor(Location) + factor(Year), data = Squid)

§  not all functions support data argument (eg, mean)

o   $ sign

§  eg, Squid$GSI

o   select column

§  eg, squid[,6]

o   attach function

§  bad way to access variables

·         eg, attach(squid)

o   attach command add squid to search path of R

·         problems – if you attach dataset that has variables that also exist outside data frame

·         use detach()

 

 

Accessing Subsets of Data

 

-          to see different values of a variable       

o   unique(squid$Sex)

§  1 for male, 2 for female

o   to access all male data

§  Sel  <- squid$Sex ==1    

·         creates vector of same length as original vector

·         values TRUE if Sex = 1, FALSE otherwise

·         Boolean vector

§  squidM <- squid[Sel, ]

·         select rows of squid for which Sel equal true

·         because we are selecting rows, square brackets

§  Example

·         to select locations 1, 2, 3 from four locations

o   all following give same result

§  squid[squid$location == 1 | squid$location == 2 | squid$location == 3, ]

§  squid[squid$location ! = 4, ]

§  squid[squid$location <4, ]

§  squid[squid$location <= 3, ]

§  squid[squid$location >= 1 & squid$location <= 3 , ]

·         to select male and location 1

o   squid[ squid$Sex == 1 & squid$location ==1, ]

·         to select male and location 1 or location 2

o   squidM.12 <- squid[squid$Sex == 1 & squid$location == 1 | squid$location == 2), ]

-          Sorting data

o   ord1 <- order[squid$month]

o   squid[ord1, ]

§  as we are manipulating row, need to put ord1 before comma

 

 

 

Combining two datasets with a common identifier

 

-          squidmerged <- merge (squid1, squid2, by = "Sample")

o   option: all

§  default = false

§  rows in either file with missing values are omitted

 

 

Exporting Data

 

 

-          write.table(squidM, file = "C:\\malesquid.txt", sep = " ", quote = FALSE, append = FALSE, na = "NA")

o   quote = FALSE  à avoids quote marks around character strings

o   na à specifies how missing values handled

o   append = FALSE  à opens new file

 

 

Recoding categorical Variables

 

-          good programming practice to create new variables in data frame that are recoded as nominal variables

-          example

o   squid$fLocation <- factor (squid$Location)

o   squid$fSex <- factor(squid$Sex)

§  two new variables created inside dataframe squid

§  nominal variables

-          relabel as M & F

o   squid$fSex <- factor(squid$Sex, levels = c(1,2), labels = c("M", "F"))

o   new variables (eg  fsex) can be used in functions.

-          resort levels in nominal variables / factors

o   example

§  squid$fLocation <- factor(squid$Location, levels = c(2,3,1,4))

-          note, when selecting factor levels, need double quotes

o   eg

§  squidM <- squid[squid$fSex =="1", ]

 

-           

 

 

 

No comments: