Thursday, November 5, 2015

R Basics 8 - Tips and Traps

General
Trap: R error message are not helpful
Tip: use traceback() to understand errors

Object coercion
Trap: R objects are often silently coerced to another class/type as/when needed.
Examples: 
c(1, TURE)
Tip: inspect objects with 
str(x)
model(x)
class(x)
typeof(x)
dput(x)
attributeds(x)

Factors(special case of coercion)
Trap: Factors cause more bug-hunting grief than just about anything else in R, especially when strig and integer vectors and data.frame cols are coerced to factor.
Tip: learn about factor and using them
Tip: explicitly test with is.factor(df$col)
Tip: use stringsAsFactors=FALSE argument when you create a data frame from file

Trap: maths doesn't work on numeric factors and they are tricky to convert back.
Tip: try as.numeric(as.character(factor))

Trap: appending rows to a data frame with factor columns is tricky.
Tip: make sure the row to be append is a presented to rbind() as a data.frame, and not as a vector or a list (which works sometimes)

Trap: the combine function c() will let you combine different factors into a vecotr of integer codes (probably garbage).
Tip: convert factors to string or integers (as appropriate) before combining.

Garbage in the workspace
Trap: R saves your workspace at the end of each session and reloads the saved workspace at the start of the next session.
Before you know it, you can have heaps of variables lurking in your workspace that are impacting on your calculations.
Tip: use ls() to check on lurking variables
Tip: clean up with rm(list=ls(all=TRUE))
Tip: library() to check on loaded packages
Tip: avoid savign workspaces, start R with the --no-save -- no-restore arguments

The 1:0 sequence in for-loops
Trap: for(x in 1:length(y)) fails on the zero length vector. IT will loop twice: first setting x to 1, then to 0.
Tip: use for(s in seq_len(y))
Trap: for ( x in y)
Tip: for x( in seq_along(y))

Space out your code and use brackets
Trap: x<-5
Tip: x<- -5
Trap: 1:n-1
Tip: 1:(n-1)
Trap: 2^2:9
Tip: 2^(2:9)

Vectors and vector recycling
Trap: most objects in R are vectors. R does not have scalars (jsut length=1 vectors).
Many Fns work on entire vectors at one.

Tip: In R, for-loop are often the inefficient and inelegant solution. Take the time to learn the various 'apply' family of functions and plyr package. 
Trap: Math with different length vectors will work with the shortest vector recycled
c(1,2,3) + c(10,20)

Vectors need the c() operator
wrong: mean(1,2,3)
correct: mean(c(1,2,3))

Use the correct Boolean operator
Tip: | and & are vectorise - use ifelse() (| and & also used with indexes to subset)
Tip: || and && are not vectorised - use if
Trap: || && lazy evaluation, | and & full evaluation
Trap: == (Boolean equality) = (assignment)

Equality testing with numbers
Trap: == and != test for near in/equality
as.double(8) = as.integer(8)
isTRUE(all.equal(x,y)) tests near equality
Tip: identical(x,y) is more fussy

Think hard about NA, NaN and NULL
Trap: NA and NaN are valid values.
Trap: many Fns fail by default on NA input
Tip: many functions take na.rm=TRUE
Tip: vector test for NA
Trap: x == NA is not the same as is.na(x)
Trap: x == NULL not he same as is.null(x)
Trap: is.numeic(NaN) returns TRUE

Indexing ([], [[]]. $)
Tip: Objects are indexed from 1 to N.
Trap: many subtle differences in indexing for vectors, lists, matrices, arrays and data.frames.
Return types vary depending on object being indexed and idexation method.
Tip: take the time to learn the differences
Trap: the zero-index fails silently
Trap: negative indexes return all but those
Trap: NA is a valid Boolean index
Trap: mismatched Boolean indexes work

Coding Practice
Tip: liberally use stopifnot() on function entry to verify argument validity
Tip: <- for assignment; = for list names

No comments:

Post a Comment

Blog Archive