Accessing Variables from a Data Frame
- read.table à produces data frame
- use names command immediately after read.table à to see variables we are dealing with
- str(structure) à informs us of status of each variable in data frame
o integer, numeric, etc
- how to access variables
o data argument in a function
§ eg, linear regression function
· lm(GSI ~factor(Location) + factor(Year), data = Squid)
§ not all functions support data argument (eg, mean)
o $ sign
§ eg, Squid$GSI
o select column
§ eg, squid[,6]
o attach function
§ bad way to access variables
· eg, attach(squid)
o attach command add squid to search path of R
· problems – if you attach dataset that has variables that also exist outside data frame
· use detach()
Accessing Subsets of Data
- to see different values of a variable
o unique(squid$Sex)
§ 1 for male, 2 for female
o to access all male data
§ Sel <- squid$Sex ==1
· creates vector of same length as original vector
· values TRUE if Sex = 1, FALSE otherwise
· Boolean vector
§ squidM <- squid[Sel, ]
· select rows of squid for which Sel equal true
· because we are selecting rows, square brackets
§ Example
· to select locations 1, 2, 3 from four locations
o all following give same result
§ squid[squid$location == 1 | squid$location == 2 | squid$location == 3, ]
§ squid[squid$location ! = 4, ]
§ squid[squid$location <4, ]
§ squid[squid$location <= 3, ]
§ squid[squid$location >= 1 & squid$location <= 3 , ]
· to select male and location 1
o squid[ squid$Sex == 1 & squid$location ==1, ]
· to select male and location 1 or location 2
o squidM.12 <- squid[squid$Sex == 1 & squid$location == 1 | squid$location == 2), ]
- Sorting data
o ord1 <- order[squid$month]
o squid[ord1, ]
§ as we are manipulating row, need to put ord1 before comma
Combining two datasets with a common identifier
- squidmerged <- merge (squid1, squid2, by = "Sample")
o option: all
§ default = false
§ rows in either file with missing values are omitted
Exporting Data
- write.table(squidM, file = "C:\\malesquid.txt", sep = " ", quote = FALSE, append = FALSE, na = "NA")
o quote = FALSE à avoids quote marks around character strings
o na à specifies how missing values handled
o append = FALSE à opens new file
Recoding categorical Variables
- good programming practice to create new variables in data frame that are recoded as nominal variables
- example
o squid$fLocation <- factor (squid$Location)
o squid$fSex <- factor(squid$Sex)
§ two new variables created inside dataframe squid
§ nominal variables
- relabel as M & F
o squid$fSex <- factor(squid$Sex, levels = c(1,2), labels = c("M", "F"))
o new variables (eg fsex) can be used in functions.
- resort levels in nominal variables / factors
o example
§ squid$fLocation <- factor(squid$Location, levels = c(2,3,1,4))
- note, when selecting factor levels, need double quotes
o eg
§ squidM <- squid[squid$fSex =="1", ]
-
No comments:
Post a Comment