General Trap: R error message are not helpful Tip: use traceback() to understand errors Object coercion Trap: R objects are often silently coerced to another class/type as/when needed. Examples: c(1, TURE) Tip: inspect objects with str(x) model(x) class(x) typeof(x) dput(x) attributeds(x) Factors(special case of coercion) Trap: Factors cause more bug-hunting grief than just about anything else in R, especially when strig and integer vectors and data.frame cols are coerced to factor. Tip: learn about factor and using them Tip: explicitly test with is.factor(df$col) Tip: use stringsAsFactors=FALSE argument when you create a data frame from file Trap: maths doesn't work on numeric factors and they are tricky to convert back. Tip: try as.numeric(as.character(factor)) Trap: appending rows to a data frame with factor columns is tricky. Tip: make sure the row to be append is a presented to rbind() as a data.frame, and not as a vector or a list (which works sometimes) Trap: the combine function c() will let you combine different factors into a vecotr of integer codes (probably garbage). Tip: convert factors to string or integers (as appropriate) before combining. Garbage in the workspace Trap: R saves your workspace at the end of each session and reloads the saved workspace at the start of the next session. Before you know it, you can have heaps of variables lurking in your workspace that are impacting on your calculations. Tip: use ls() to check on lurking variables Tip: clean up with rm(list=ls(all=TRUE)) Tip: library() to check on loaded packages Tip: avoid savign workspaces, start R with the --no-save -- no-restore arguments The 1:0 sequence in for-loops Trap: for(x in 1:length(y)) fails on the zero length vector. IT will loop twice: first setting x to 1, then to 0. Tip: use for(s in seq_len(y)) Trap: for ( x in y) Tip: for x( in seq_along(y)) Space out your code and use brackets Trap: x<-5 Tip: x<- -5 Trap: 1:n-1 Tip: 1:(n-1) Trap: 2^2:9 Tip: 2^(2:9) Vectors and vector recycling Trap: most objects in R are vectors. R does not have scalars (jsut length=1 vectors). Many Fns work on entire vectors at one. Tip: In R, for-loop are often the inefficient and inelegant solution. Take the time to learn the various 'apply' family of functions and plyr package. Trap: Math with different length vectors will work with the shortest vector recycled c(1,2,3) + c(10,20) Vectors need the c() operator wrong: mean(1,2,3) correct: mean(c(1,2,3)) Use the correct Boolean operator Tip: | and & are vectorise - use ifelse() (| and & also used with indexes to subset) Tip: || and && are not vectorised - use if Trap: || && lazy evaluation, | and & full evaluation Trap: == (Boolean equality) = (assignment) Equality testing with numbers Trap: == and != test for near in/equality as.double(8) = as.integer(8) isTRUE(all.equal(x,y)) tests near equality Tip: identical(x,y) is more fussy Think hard about NA, NaN and NULL Trap: NA and NaN are valid values. Trap: many Fns fail by default on NA input Tip: many functions take na.rm=TRUE Tip: vector test for NA Trap: x == NA is not the same as is.na(x) Trap: x == NULL not he same as is.null(x) Trap: is.numeic(NaN) returns TRUE Indexing ([], [[]]. $) Tip: Objects are indexed from 1 to N. Trap: many subtle differences in indexing for vectors, lists, matrices, arrays and data.frames. Return types vary depending on object being indexed and idexation method. Tip: take the time to learn the differences Trap: the zero-index fails silently Trap: negative indexes return all but those Trap: NA is a valid Boolean index Trap: mismatched Boolean indexes work Coding Practice Tip: liberally use stopifnot() on function entry to verify argument validity Tip: <- for assignment; = for list names
Thursday, November 5, 2015
R Basics 8 - Tips and Traps
Labels:
R
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2015
(43)
-
▼
November
(18)
- Similarity Calculation 5 - Mutual Information Usin...
- Similarity Calculation 4 - Naive Bayes Using SQL a...
- Similarity Calculation 3 - Gini/Efficiency Similar...
- Similarity Calculation 2 - Cosine Similarity Using...
- Similarity Calculation 1 - Jaccard Similarity Usin...
- R Basics 12 - Enviromnetnts, Frames and the Call S...
- R Basics 11 - OOP and R5
- R Basics 10 - Avoiding For-Loops
- R Basics 9 - Writing Functions
- R Basics 8 - Tips and Traps
- R Basics 7 - Factors
- R Basics 6 - Matrices and Arrays
- R Basics 5 - Data Frames
- R Basics 4 - Lists
- R Basics 3 - Atomic Vectors
- R Basics 2 - Basic List of Useful Functions in R
- R Basics 1 - Brief Introduction to Language Elemen...
- Part 2: Frequent Item Sets Using SQL
-
▼
November
(18)
No comments:
Post a Comment