What is Data
Some R Essentials
- Starting R
o Command prompt: >
- Using R as a calculator
- Functions
o Many functions – extra arguments to change default
§ Log(10) - log base e
§ Log(10,10) = 1
§ Log(10, base=10) = 1
o Error messages
§ Nan - not a number – ie sqrt(-2)
- Continuation prompt : + : indicating more input is expected
- Assignment
o X = 2 - assignment is quiet – no output
o Built in constants – eg: pi
- Assignment with = versus <-
o Format x = 2x + 1 à can cause confusion – is it assignment or math equation
o Alternative format
§ X <- 2*x + 1
- Acceptable variable names
o Use
§ Letters / numbers / dots / underline
o Name starts with
§ Letter or dot
· Cannot have dot followed by number
o Case is important
- Using c() to enter data
o Data vector
o C() function
§ Eg
· Whales = c(74,122,235,111,292,111,211,133,156,79)
§ Can also combine data vectors
· X = c(74,122,235,111,292)
· Y = c(111,211,133,156,79)
· C(x,y) à combines output
o Data vectors have a type
§ All values must have same type
§ Character strings : " or '
o Giving data vectors named entries
§ simpsons = c("Homer","Marge","Bart","Lisa","Maggie")
§ Names(simpsons) = c("dad","mom","son","daughter1","daughter2")
o Using functions on a data vector
§ Once data is stored in a variable, can use functions on it.
· Eg
o Sum(whales)
o Length(whales)
§ Sum
§ Length
§ Mean
§ Sort
§ Min
§ Max
§ Range
§ Diff
§ Sumsum
o Vectorisation of functions
§ Eg
· Whales + whales.fla
o First item in each list added, second item in each list added, etc
· Whales – mean(whales)
o Mean is subtracted from each item
§ Example – calculating variance
o Help
§ help()
· help(mean)
· ?mean
· ?"mean"
§ Help.search("mean") - any mention of mean
§ Apropos() - all documented functions and variables with mean in their names
· Apropos("mean")
§ Help.start()
§ Example(mean)
o Editing
§ Arrow keys
§ Using data.entry() or edit() to edit data
· Spreadsheet like editor
o Creating structured data
§ Simple sequences
§ Arithmetic sequences
§ Seq(1,9,by=2)
§ Re(1,10)
Accessing data by using indices
- Ebay = c(88.8, 88.3, 90.2, 93.5, 95.2, 94.7, 99.2, 99.4, 101.6)
- Ebay[1] à returns 88.8
- Ebay[length(ebay)] à 101.6
- Slicing
o Ebay[1:4] à 88.8 88.3 90.2 93.5
- Negative indices
o If index is negative and no less than –n à return all but the ith value of vector
- Accessing by names
o When data vector has names – values can be accessed by names
§ X = 1:3
§ Names(x) = c("one,"two","three")
§ X["one"]
- Parentheses à function
- Square brackets à data vectors
- Assign values to data vector
o Can assign values to data vector element by element using indices
§ Ebay[1] = 88.0
§ Ebay[10:13] = c(97,99,102,101)
- Data recycling
- Logical values
o Ebay > 100 à returns true , false, etc for all values
o Ebay [ebay > 100] à returns values greater than 100
o Which(ebay > 100) à returns indices of values greater than 100
o Ebay[c(9,12,13) à returns values of these indices
o Sum(ebay > 100)
- Creating logical vectors by conditions
o Logical operators: <, > , <= , >= , ==, !=
- Missing values
o NA à data not available
o Is.na()
- Managing the work environment
o Ls() à shows all objects that have been defined or loaded into work environment
o Objects()
§ Both will list all objects in given work environment
· What is diff between 2 functions
o browseEnv() à uses web browser to show results
o remove objects
§ rm()
§ remove()
Reading In Other Data Sources
- using R's built in libraries and data sets
- library(package name)
o library(MASS)
o data(survey) à redundant for versions >= 2.0.0
- require()
- load data set without loading package
o data(survey, package="MASS")
- accessing variables in data set
o data frame
o $
o Attach()
o With()
§ Example
· Library(MASS) à load package, includes geyser data set
· Names(geyser)
· Geyser$waiting à access waiting variable in geyser data set
§ With() performs attach() and detach() commands at once.
· With(data.frame, command)
o Examples
§ Data(Sitka) à load data set
§ Names(Sitka)
§ Length(Sitka$tree) à length
§ With(Sitka,range(tree))
§ Attach(Sitka)
§ Summary(tree)
§ Detach(Sitka)
- Using data sets that accompany this book
o Install.packages(packagename)
o Once installed, package can be loaded with library()
§ And data sets accessed with data()
- Other methods of data entry
o Cut and paste
§ Works well with c() if data separated by commas
o Scan() à reads in input until blank line entered
§ Whales = scan()
· 1: 74 122 235 111 292 111 211 133 156 79
· 11:
o Reading data from formatted data sources
§ Whale = scan(file="whale.txt")
o Tables of data can be read in with read.table() function
§ Read.table(whale.txt", header=TRUE)
o Specifying the file
§ Read.table(file=file.choose()) à allows user to interactively close file
o If file not in working directory, specify path
§ "C:/R/data.txt
o Finding files from internet
GW Comments / Questions
- How to control number of decimal points displayed
- How to exit out of continuation prompt
o Clear console
No comments:
Post a Comment