Saturday, March 5, 2011

Verzani – Chapter 1 – Data


 

What is Data

 

Some R Essentials

 

-          Starting R

o   Command prompt: >

-          Using R as a calculator

-          Functions

o   Many functions – extra arguments to change default

§  Log(10)   - log base e

§  Log(10,10)  = 1

§  Log(10, base=10) = 1

o   Error messages

§  Nan  - not a number – ie sqrt(-2)

-          Continuation prompt : + : indicating more input is expected

-          Assignment

o   X = 2     - assignment is quiet – no output

o   Built in constants – eg:   pi

-          Assignment with = versus <-

o   Format x = 2x + 1   à can cause confusion – is it assignment or math equation

o   Alternative format

§  X <- 2*x + 1

-          Acceptable variable names

o   Use

§  Letters / numbers / dots / underline

o   Name starts with

§  Letter or dot

·         Cannot have dot followed by number

o   Case is important

-          Using c() to enter data

o   Data vector

o   C() function

§  Eg

·         Whales = c(74,122,235,111,292,111,211,133,156,79)

§  Can also combine data vectors

·         X = c(74,122,235,111,292)

·         Y = c(111,211,133,156,79)

·         C(x,y)    à combines output

o   Data vectors have a type

§  All values must have same type

§  Character strings : "  or  '

o   Giving data vectors named entries

§  simpsons = c("Homer","Marge","Bart","Lisa","Maggie")

§  Names(simpsons) = c("dad","mom","son","daughter1","daughter2")

o   Using functions on a data vector

§  Once data is stored in a variable, can use functions on it.

·         Eg

o   Sum(whales)

o   Length(whales)

§  Sum

§  Length

§  Mean

§  Sort

§  Min

§  Max

§  Range

§  Diff

§  Sumsum

o   Vectorisation of functions

§  Eg

·         Whales + whales.fla

o   First item in each list added, second item in each list added, etc

·         Whales – mean(whales)

o   Mean is subtracted from each item

§  Example – calculating variance

o   Help

§  help()

·         help(mean)

·         ?mean

·         ?"mean"

§  Help.search("mean")   - any mention of mean

§  Apropos()  - all documented functions and variables with mean in their names

·         Apropos("mean")

§  Help.start()

§  Example(mean)

o   Editing

§  Arrow keys

§  Using data.entry() or edit() to edit data

·         Spreadsheet like editor

o   Creating structured data

§  Simple sequences

§  Arithmetic sequences

§  Seq(1,9,by=2)

§  Re(1,10)

 

Accessing data by using indices

 

-          Ebay = c(88.8, 88.3, 90.2, 93.5, 95.2, 94.7, 99.2, 99.4, 101.6)

-          Ebay[1]  à returns 88.8

-          Ebay[length(ebay)] à 101.6

-          Slicing

o   Ebay[1:4]  à 88.8  88.3  90.2  93.5

-          Negative indices

o   If index is negative and no less than –n à return all but the ith value of vector

-          Accessing by names

o   When data vector has names – values can be accessed by names

§  X = 1:3

§  Names(x) = c("one,"two","three")

§  X["one"]

-          Parentheses à function

-          Square brackets à data vectors

-          Assign values to data vector

o   Can assign values to data vector element by element using indices

§  Ebay[1] = 88.0

§  Ebay[10:13] = c(97,99,102,101)

-          Data recycling

-          Logical values

o   Ebay > 100  à returns true , false, etc for all values

o   Ebay [ebay > 100] à returns values greater than 100

o   Which(ebay > 100)  à returns indices of values greater than 100

o   Ebay[c(9,12,13)  à returns values of these indices

o   Sum(ebay > 100)

-          Creating logical vectors by conditions

o   Logical operators: <, > , <= , >= , ==, !=

-          Missing values

o   NA  à data not available

o   Is.na()

-          Managing the work environment

o   Ls()  à shows all objects that have been defined or loaded into work environment

o   Objects()

§  Both will list all objects in given work environment

·         What is diff between 2 functions

o   browseEnv()  à uses web browser to show results

o   remove objects

§  rm()

§  remove()

 

 

Reading In Other Data Sources

 

-          using R's built in libraries and data sets

-          library(package name)

o   library(MASS)

o   data(survey)   à redundant for versions >= 2.0.0

-          require()

-          load data set without loading package

o   data(survey, package="MASS")

-          accessing variables in data set

o   data frame

o   $

o   Attach()

o   With()

§  Example

·         Library(MASS)     à load package, includes geyser data set

·         Names(geyser)

·         Geyser$waiting  à access waiting variable in geyser data set

§  With() performs attach() and detach() commands at once.

·         With(data.frame, command)

o   Examples

§  Data(Sitka)   à load data set

§  Names(Sitka)

§  Length(Sitka$tree)   à length

§  With(Sitka,range(tree))  

§  Attach(Sitka)

§  Summary(tree)

§  Detach(Sitka)

-          Using data sets that accompany this book

o   Install.packages(packagename)               

o   Once installed, package can be loaded with library()

§  And data sets accessed with data()

-          Other methods of data entry

o   Cut and paste

§  Works well with c() if data separated by commas

o   Scan()  à reads in input until blank line entered

§  Whales = scan()

·         1: 74 122 235 111 292 111 211 133 156 79

·         11:

o   Reading data from formatted data sources

§  Whale = scan(file="whale.txt")

o   Tables of data can be read in with read.table() function

§  Read.table(whale.txt", header=TRUE)

o   Specifying the file

§  Read.table(file=file.choose())    à allows user to interactively close file

o   If file not in working directory, specify path

§  "C:/R/data.txt

o   Finding files from  internet

 

 

 

 

 

 

 

GW Comments / Questions

 

-          How to control number of decimal points displayed

-          How to exit out of continuation prompt

o   Clear console

No comments: