Functions in R are called closures. # Don't be deceived by the curly brackets; # R is much more like Lisp than C or Java. # Defining problems in the terms of function # calls and their lazy, delayed evaluation # (variable resolution) is R's big feature. Standard form (for named functions) plus <- function(x, y) {x+y} plus(5,6) # return() not needed - last value returned # Optional curly brackets with 1-line fns: x.to.y <- function(x,y) return (x^y) Returning values # return() - can use to aid readability and fro exit part way trhrough a function # invisible() - return values thant do not print if not assigned. # Traps: return() is a function, not a statement. The brackets are needed. Anonymous fucntions # Often used in arguments to fucntions: v <- 1:9; cube <- sapply(v, function(x) x^3) Arguments are passed by value # Effectively arguments are copied, and any changes made to the argument within the function do not affect the caller's copy. # Trap: arguments are not typed and your function could be passed anything! # Upfront argument checking advised! Arguments passed by position or name b <- function(cat, dog, cow) cat+dog+cow b(1,2,3) b(cow=3, cat =1, dog=2) # Trap: not all arguments need to passed f <- funciton(x) missing(x); f(); f('here') # match.arg() - argument partial matching Default arguments # Default arguments can be specified x2y.1 <- function(x, y=2) {x^y} x2y.2 <- function(x, y=x) {x^y} x2y.2(3) x2y.2(2,3) The dots argument (...) is a catch - all f <- function(...) { # simle way to access dots arguments dots <- list() } x <- f(5); dput(x) g <- function(...) { dots <- substitute(list(,,,))[-1] dots.names <- sapply(dots, deparse) } x <- g(a,b,c) dput(x) -> c("a", "b", "c") # dots can be passed to another function: h <- function(x, ...) g(...) x <- h(a, b, c); Function environment # When a function is called a new environment (frame) is created for it. # There frames are found in the call stack. Fist frame is the global environment # Next Function reaches back into the call stack. called.by <- function() { # returns string if(length(sys.parents()) <=2) return('.GlobalEnv') deparse(sys.call(sys.parent(2))) } g <- function(...) { called.by() } f <- fucntion(...) g(...); f(a,2) Variable scope and unbound variables # Within a function, variables are resoved in the local frame first, # then in terms of super-functions (when a functions defined inside a function), then in terms of the global environment. h <- fucntion(x) { x+a } a <- 5 h(5) k <- function(x) { a<- 100; h(x) } k(10) Super assignment # x <<- y ignores the local x, and looks up the super-environments for a x to replace accumulator <- fucntion() { a <- 0 function(x) { a <<- a +x } } acc <- accumulator() acc(1) acc(5) acc(2) Operator and replacement functions `+`(4,5) # -> 9 - operators are just fns `%plus%` <- function(a,b) {a+b} # "FUN(x) <- v is parsed as: x <- FUN(x, v) "cap<-" <- function(x, value) ifelse(x>value, value, x) x <- c(1,10,100); cap(x) <- 9 Exeptions tryCatch(print('pass'), error=fucntion(e) print('bad'), finally=print('done')) tryCatch(stop('fail'), error=function(e) print('bad'), finally=print('done')) Useful language reflection functions exists(); get(); assign() - for variabels substitute() bquote() eval() do.call() parse() deparse() quote() enquote()
Wednesday, November 11, 2015
R Basics 9 - Writing Functions
Thursday, November 5, 2015
R Basics 8 - Tips and Traps
General Trap: R error message are not helpful Tip: use traceback() to understand errors Object coercion Trap: R objects are often silently coerced to another class/type as/when needed. Examples: c(1, TURE) Tip: inspect objects with str(x) model(x) class(x) typeof(x) dput(x) attributeds(x) Factors(special case of coercion) Trap: Factors cause more bug-hunting grief than just about anything else in R, especially when strig and integer vectors and data.frame cols are coerced to factor. Tip: learn about factor and using them Tip: explicitly test with is.factor(df$col) Tip: use stringsAsFactors=FALSE argument when you create a data frame from file Trap: maths doesn't work on numeric factors and they are tricky to convert back. Tip: try as.numeric(as.character(factor)) Trap: appending rows to a data frame with factor columns is tricky. Tip: make sure the row to be append is a presented to rbind() as a data.frame, and not as a vector or a list (which works sometimes) Trap: the combine function c() will let you combine different factors into a vecotr of integer codes (probably garbage). Tip: convert factors to string or integers (as appropriate) before combining. Garbage in the workspace Trap: R saves your workspace at the end of each session and reloads the saved workspace at the start of the next session. Before you know it, you can have heaps of variables lurking in your workspace that are impacting on your calculations. Tip: use ls() to check on lurking variables Tip: clean up with rm(list=ls(all=TRUE)) Tip: library() to check on loaded packages Tip: avoid savign workspaces, start R with the --no-save -- no-restore arguments The 1:0 sequence in for-loops Trap: for(x in 1:length(y)) fails on the zero length vector. IT will loop twice: first setting x to 1, then to 0. Tip: use for(s in seq_len(y)) Trap: for ( x in y) Tip: for x( in seq_along(y)) Space out your code and use brackets Trap: x<-5 Tip: x<- -5 Trap: 1:n-1 Tip: 1:(n-1) Trap: 2^2:9 Tip: 2^(2:9) Vectors and vector recycling Trap: most objects in R are vectors. R does not have scalars (jsut length=1 vectors). Many Fns work on entire vectors at one. Tip: In R, for-loop are often the inefficient and inelegant solution. Take the time to learn the various 'apply' family of functions and plyr package. Trap: Math with different length vectors will work with the shortest vector recycled c(1,2,3) + c(10,20) Vectors need the c() operator wrong: mean(1,2,3) correct: mean(c(1,2,3)) Use the correct Boolean operator Tip: | and & are vectorise - use ifelse() (| and & also used with indexes to subset) Tip: || and && are not vectorised - use if Trap: || && lazy evaluation, | and & full evaluation Trap: == (Boolean equality) = (assignment) Equality testing with numbers Trap: == and != test for near in/equality as.double(8) = as.integer(8) isTRUE(all.equal(x,y)) tests near equality Tip: identical(x,y) is more fussy Think hard about NA, NaN and NULL Trap: NA and NaN are valid values. Trap: many Fns fail by default on NA input Tip: many functions take na.rm=TRUE Tip: vector test for NA Trap: x == NA is not the same as is.na(x) Trap: x == NULL not he same as is.null(x) Trap: is.numeic(NaN) returns TRUE Indexing ([], [[]]. $) Tip: Objects are indexed from 1 to N. Trap: many subtle differences in indexing for vectors, lists, matrices, arrays and data.frames. Return types vary depending on object being indexed and idexation method. Tip: take the time to learn the differences Trap: the zero-index fails silently Trap: negative indexes return all but those Trap: NA is a valid Boolean index Trap: mismatched Boolean indexes work Coding Practice Tip: liberally use stopifnot() on function entry to verify argument validity Tip: <- for assignment; = for list names
R Basics 7 - Factors
Factors
- A one-dimensional array of categorical (unordered) or ordinal (ordered) data. - Indexed from 1 to N. Not fixed length. - Named factors are possible (but rare). - The hidden/unexpected coercion of object of a factor is a key source of bugs. Why use Factors - Specifying a non-alphabetical order - Some statistical functions treat cat/ordinal data differently from continuous data. - Deep ggplot2 code depends on it Create Example 1 - unordered > sex.v <- c('M', 'F', 'F', 'M', 'F'); sex.v [1] "M" "F" "F" "M" "F" > sex.f <- factor(sex.v); sex.f [1] M F F M F Levels: F M > sex.w <- as.character(sex.f); sex.w [1] "M" "F" "F" "M" "F" Example 2 - ordered (small, medium, large) > size.v <- c('S','L', 'M', 'L', 'S', 'M'); size.v [1] "S" "L" "M" "L" "S" "M" > size1.f <- factor(size.v, ordered = TRUE); size1.f [1] S L M L S M Levels: L < M < S
Example 3 - ordered, where we set the order > size.lvls <- c('S','M','L') > size2.f <- factor(size.v, levels=size.lvls); size2.f [1] S L M L S M Levels: S M L
Example 4 - ordered with levels and labels > levels <- c(1,2,3,99) > labels <- c('love', 'neutral', 'hate', NA); labels [1] "love" "neutral" "hate" NA > data.v <- c(1,2,3,99,1,2,1,2,99); data.v [1] 1 2 3 99 1 2 1 2 99 > data.f <- factor(data.v, levels=levels, labels=labels); data.f [1] love neutral hate <NA> love neutral love neutral <NA> Levels: love neutral hate <NA>
Example 5 - using the cut function to group > i <- 1:50 + rnorm(50,0,5) > k <- cut(i,5); k [1] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (10.2,21.1] [9] (-0.859,10.2] (10.2,21.1] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (10.2,21.1] (-0.859,10.2] (10.2,21.1] [17] (10.2,21.1] (21.1,32.1] (21.1,32.1] (10.2,21.1] (-0.859,10.2] (10.2,21.1] (10.2,21.1] (21.1,32.1] [25] (21.1,32.1] (10.2,21.1] (32.1,43] (21.1,32.1] (21.1,32.1] (21.1,32.1] (32.1,43] (32.1,43] [33] (32.1,43] (21.1,32.1] (21.1,32.1] (32.1,43] (32.1,43] (43,54.1] (32.1,43] (43,54.1] [41] (32.1,43] (32.1,43] (32.1,43] (43,54.1] (43,54.1] (43,54.1] (43,54.1] (43,54.1] [49] (43,54.1] (43,54.1] Levels: (-0.859,10.2] (10.2,21.1] (21.1,32.1] (32.1,43] (43,54.1] > Basic information about a factor > dim(f) NULL > is.factor(f) [1] TRUE > is.atomic(f) [1] TRUE > is.vector(f) [1] FALSE > is.list(f) [1] FALSE > is.recursive(f) [1] FALSE > length(f) [1] 24 > names(f) NULL > mode(f) [1] "numeric" > class(f) [1] "factor" > typeof(f) [1] "integer" > is.ordered(f) [1] FALSE > unclass(f) [1] 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 attr(,"levels") [1] "4" "3" "2" "1" > cat(f) 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 > print(f) [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Levels: 4 3 2 1 > str(f) Factor w/ 4 levels "4","3","2","1": 4 3 2 1 4 3 2 1 4 3 ... > dput(f) structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L), .Label = c("4", "3", "2", "1"), class = "factor") > head(f) [1] 1 2 3 4 1 2 Levels: 4 3 2 1 Indexing: much like atomic vectors - [x] selects a factor for the cell/range x - [[x]] selects a length=1 factor fro the single cell index x (rarely used) - The $ operator is invalid with actors Factor arithmetic & Boolean comparisons - factors cannot be added, multiple, etc. - same-type factors are equality testable > x <- sex.f[1] == sex.f[2];x [1] FALSE -- order factors can be order compared> z <- size1.f[1] < size1.f[2]; z [1] FALSEManaging the enumeration (levels) > f <- factor(letters[1:3]);f [1] a b c Levels: a b c > levels(f) [1] "a" "b" "c" > levels(f)[1] [1] "a" > any(levels(f) %in% c('a', 'b', 'c')) [1] TRUE # add new levels > levels(f)[length(levels(f))+1] <-'z'; f [1] a b c Levels: a b c z AA > levels(f) <- c(levels(f), 'AA');f [1] a b c Levels: a b c z AA # reorder levels > f <- factor(f, levels(f)[c(4,1:3,5)]);f [1] xx b c Levels: c z xx b BB # change/rename levels > levels(f)[1] <- 'xx';f [1] xx b c Levels: xx b c z BB > levels(f)[levels(f) %in% 'AA'] <- 'BB';f [1] xx b c Levels: xx b c z BB # delete(drop) unused levels > f <- f[drop=TRUE] > f [1] xx b c Levels: c xx b Adding an element to a factor > f <- factor(letters[1:10]); f [1] a b c d e f g h i j Levels: a b c d e f g h i j > f[length(f) + 1] <- 'a'; f [1] a b c d e f g h i j a Levels: a b c d e f g h i j > f <- factor(c(as.character(f), 'zz')); f [1] a b c d e f g h i j a zz Levels: a b c d e f g h i j zz Merging/combining factors > a <- factor(1:10);a [1] 1 2 3 4 5 6 7 8 9 10 Levels: 1 2 3 4 5 6 7 8 9 10 > b <- factor(letters[a]);b [1] a b c d e f g h i j Levels: a b c d e f g h i j > union <- factor(c(as.character(a), as.character(b))); union [1] 1 2 3 4 5 6 7 8 9 10 a b c d e f g h i j Levels: 1 10 2 3 4 5 6 7 8 9 a b c d e f g h i j > cross <- interaction(a,b); cross [1] 1.a 2.b 3.c 4.d 5.e 6.f 7.g 8.h 9.i 10.j 100 Levels: 1.a 2.a 3.a 4.a 5.a 6.a 7.a 8.a 9.a 10.a 1.b 2.b 3.b 4.b 5.b 6.b 7.b 8.b 9.b 10.b 1.c 2.c 3.c 4.c 5.c ... 10.j
Using factors within data frames df$x <- reorder(df$f, df$x, F, order=T) by(df$x, df$f, F)
Traps 1 Strings loaded from a file converted to factors (read.table or read.csv stringASFactors=FALSE) 2 Numbers from a file factorised. as.numeric(levels(f))[as.integer(f)] 3 One factor (enumeration) cannot be meaningfully compared with another 4 NA's missing data in factors and levels can cause problems 5 Adding a row to a data frame, which adds a new level to a column factor.
R Basics 6 - Matrices and Arrays
Context Matices and arrays are an extension on R's atomic vecotrs. Atomic vectors contain values (not objects). They hold a contiguous selt of values, all of which are the same basic type. There are six types of atomic vecotr: logical, integer, numeric, complex, caracter and raw. Importantly: atomic vectors have no dimension attribute. Matrices and arrays are effectively vectors with a dimension attribute. Matrices are two-dimensional(tabular) objects, containing values all of the same type (unlike data frames). Arrays are multi-dimensional objects(typically with three plus dimensions), with values all of the same type. Matrix versus data.frame In a matrix, every column, and every cell is of the same basic atomic type. In a data.frame each column can be of a different type(eg. numeric, character, factor). Data frames are best with messy data, and for variables of mixed models. Matrix creation ## generalCase <- matrix(data=NA, nrow=1, ncol=1, byrow=FALSE, dimnames=NULL) > M <- matrix(c(2,1,3,4,5,6), nrow=3, byrow=TRUE); M [,1] [,2] [1,] 2 1 [2,] 3 4 [3,] 5 6 > b <- matrix(c(0, -1, 4)); b [,1] [1,] 0 [2,] -1 [3,] 4 > I <- diag(3); I [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 > D <- diag(c(1,2,3)); D [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 2 0 [3,] 0 0 3 > d <- diag(M); d [1] 2 4 Basic information about a matrix > dim(M) [1] 3 2 > class(M) [1] "matrix" > is.matrix(M) [1] TRUE > is.array(M) [1] TRUE > is.atomic(M) [1] TRUE > is.vector(M) [1] FALSE > is.list(M) [1] FALSE > is.factor(M) [1] FALSE > is.recursive(M) [1] FALSE > nrow(M) [1] 3 > ncol(M) [1] 2 > length(M) [1] 6 > rownames(M) NULL > colnames(M) NULL Matrix manipulation > M <- matrix(c(2,1,3,4,5,6), nrow=3, byrow=TRUE); M [,1] [,2] [1,] 2 1 [2,] 3 4 [3,] 5 6 > N <- matrix(c(6,5,4,3,2,1), nrow=3, byrow=TRUE); N [,1] [,2] [1,] 6 5 [2,] 4 3 [3,] 2 1 > newM <- cbind(M, N); newM [,1] [,2] [,3] [,4] [1,] 2 1 6 5 [2,] 3 4 4 3 [3,] 5 6 2 1 > newM <- rbind(M, N); newM [,1] [,2] [1,] 2 1 [2,] 3 4 [3,] 5 6 [4,] 6 5 [5,] 4 3 [6,] 2 1 > v <- c(M); v [1] 2 3 5 1 4 6 > df <- data.frame(M); df X1 X2 1 2 1 2 3 4 3 5 6 Matrix multiplication > M [,1] [,2] [1,] 2 1 [2,] 3 4 [3,] 5 6 > N [,1] [,2] [1,] 6 5 [2,] 4 3 [3,] 2 1 > InnerProduct <- M %*% t(N); InnerProduct [,1] [,2] [,3] [1,] 17 11 5 [2,] 38 24 10 [3,] 60 38 16 > OuterProduct <- M %o% N; OuterProduct , , 1, 1 [,1] [,2] [1,] 12 6 [2,] 18 24 [3,] 30 36 , , 2, 1 [,1] [,2] [1,] 8 4 [2,] 12 16 [3,] 20 24 , , 3, 1 [,1] [,2] [1,] 4 2 [2,] 6 8 [3,] 10 12 , , 1, 2 [,1] [,2] [1,] 10 5 [2,] 15 20 [3,] 25 30 , , 2, 2 [,1] [,2] [1,] 6 3 [2,] 9 12 [3,] 15 18 , , 3, 2 [,1] [,2] [1,] 2 1 [2,] 3 4 [3,] 5 6 > CrossProduct <- crossprod(M, N); CrossProduct [,1] [,2] [1,] 34 24 [2,] 34 23 > M * N [,1] [,2] [1,] 12 5 [2,] 12 12 [3,] 10 6
Matrix maths > rowMeans(M) [1] 1.5 3.5 5.5 > colMeans(M) [1] 3.333333 3.666667 > rowSums(M) [1] 3 7 11 > colSums(M) [1] 10 11 > t <- t(M);t [,1] [,2] [,3] [1,] 2 3 5 [2,] 1 4 6 > inverse <- solve(diag(c(1,2,3))); inverse [,1] [,2] [,3] [1,] 1 0.0 0.0000000 [2,] 0 0.5 0.0000000 [3,] 0 0.0 0.3333333 > e <- eigen(diag(c(1,2,3))); e $values [1] 3 2 1 $vectors [,1] [,2] [,3] [1,] 0 0 1 [2,] 0 1 0 [3,] 1 0 0 > d <- det(diag(c(1,2,3))); d [1] 6 Matrix indexing [row, col] [[row, col]] # [[ for single cell selection; [ for multi cell selection # indexed by positive numbers: these ones # indexed by negative numbers: not these # indexed by logical atomic vector: in/out # named rows/cols can be indexed by name # M[i] or M[[i]] is vector-like indexing # $ operator is invalid for atomic vectors # M[r,] # M[,c] Arrays
A three dimensional array created in two steps:
> A <- 1:8;A
[1] 1 2 3 4 5 6 7 8
> dim(A) <- c(2,2,2);A
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
A matrix is a special case of array. Matrices are arrays with two dimensions.
> M <- array(1:9, dim=c(3,3));M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
R Basics 5 - Data Frames
Create data frame
- The R way of doing spreadsheets
- Internally, a data.frame is a list of equal length vectors or factors.
- Observations in rows; Variables in cols
empty <-data.frame()
> empty <-data.frame()
> c1 <- 1:10
> c2 <- letters[1:10]
> df <- data.frame(col1=c1, col2=c2)
> df
col1 col2
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
Import from and export to file
d2 <- read.csv('fileName.csv', header = TRUE)
library(gdata);
d2 <- read.xls('file.xls')
write.csv(df, file='fileName.csv')
print(xtable(df), type='html')
Basic infomrmation about the data frame
> is.data.frame(df)
[1] TRUE
> class(df)
[1] "data.frame"
> nrow(df)
[1] 10
> ncol(df)
[1] 2
> colnames(df);
[1] "col1" "col2"
> rownames(df);
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
Referencing cells [row, col] [[r, c]]
## [[ for single cell selection;
# [ for multi cell selection;
> vec <- df[[5,2]]; vec
[1] e
Levels: a b c d e f g h i j
> newDF <- df[1:5, 1:2]; newDF
col1 col2
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
> df[[2, 'col1']]
[1] 2
> df[3:5, c('col1', 'col2')]
col1 col2
3 3 c
4 4 d
5 5 e
Referencing rows [r, ]
# returns a data frame ( and not a vecotr! )
> row.1 <- df[1,]; row.1
col1 col2
1 1 a
> row.n <- df[nrow(df),]; row.n
col1 col2
10 10 j
> vrow <- as.numeric(as.vector(df[1,])); vrow
[1] 1 1
> vrow <- as.character(as.vector(df[1,])); vrow
[1] "1" "1"
Referencing columns [,c] [d] [[d]] $col
> names(df) <- c('num','cats')
> col.vec <- df$cats; col.vec
[1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # returns vector
> col.vec <- df[, 'cats'] ; col.vec
[1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # a is int or string
> col.vec <- df[ , 2]; col.vec
[1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # returns a vector
> col.vec <- df[['cats']]; col.vec
[1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # returns 1 col df
> frog.df <- df['cats']
> # returns 1 col df
> first.df <- df[1]; first.df
num
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> first.col <- df[,1]; first.col
[1] 1 2 3 4 5 6 7 8 9 10
> # returns a vector
> last.col <- df[,ncol(df)]; last.col
[1] a b c d e f g h i j
Levels: a b c d e f g h i j
Adding rows
# The right way ... (both args are DFs)
df <- rbind(df, data.frame(num=1, cats='A')); df
Adding columns
> df$newCol <- rep(NA, nrow(df)); df
col1 col2 newCol
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA
6 6 f NA
7 7 g NA
8 8 h NA
9 9 i NA
10 10 j NA
> #Copy a column
> df[, 'copyofCol'] <- 1:nrow(df); df
col1 col2 newCol copyofCol
1 1 a NA 1
2 2 b NA 2
3 3 c NA 3
4 4 d NA 4
5 5 e NA 5
6 6 f NA 6
7 7 g NA 7
8 8 h NA 8
9 9 i NA 9
10 10 j NA 10
> names(df) <- c('x','cats','newCol','y')
> df$y.percent.pf.x <- df$y/sum(df$x)*100; df
x cats newCol y y.percent.pf.x
1 1 a NA 1 1.818182
2 2 b NA 2 3.636364
3 3 c NA 3 5.454545
4 4 d NA 4 7.272727
5 5 e NA 5 9.090909
6 6 f NA 6 10.909091
7 7 g NA 7 12.727273
8 8 h NA 8 14.545455
9 9 i NA 9 16.363636
10 10 j NA 10 18.181818
> df <-cbind(col=rep('a',nrow(df)), df); df
col x cats newCol y y.percent.pf.x
1 a 1 a NA 1 1.818182
2 a 2 b NA 2 3.636364
3 a 3 c NA 3 5.454545
4 a 4 d NA 4 7.272727
5 a 5 e NA 5 9.090909
6 a 6 f NA 6 10.909091
7 a 7 g NA 7 12.727273
8 a 8 h NA 8 14.545455
9 a 9 i NA 9 16.363636
10 a 10 j NA 10 18.181818
> df <- cbind(df,col=rep('b',nrow(df))); df
col x cats newCol y y.percent.pf.x col
1 a 1 a NA 1 1.818182 b
2 a 2 b NA 2 3.636364 b
3 a 3 c NA 3 5.454545 b
4 a 4 d NA 4 7.272727 b
5 a 5 e NA 5 9.090909 b
6 a 6 f NA 6 10.909091 b
7 a 7 g NA 7 12.727273 b
8 a 8 h NA 8 14.545455 b
9 a 9 i NA 9 16.363636 b
10 a 10 j NA 10 18.181818 b
> df$c3 <- with(df, col3 <- x*y); df
col x cats newCol y y.percent.pf.x col c3
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
> transform(df, col4 <- x+y)
col x cats newCol y y.percent.pf.x col c3
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
Set column names # same for rownames()
> colnames(df) <- c('date', 'alpha', 'beta'); df
date alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
> colnames(df)[1] <- 'new.name'; df
new.name alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
> colnames(df)[colnames(df) %in% c('a', 'b')] <- c('x', 'y'); df
new.name alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
Selecting Multiple Rows
> firstTenRows <- df[1:10,]; firstTenRows
new.name alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
> everthingButRowTwo <- df[-2,]; everthingButRowTwo
new.name alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
10 a 10 j NA 10 18.181818 b 100
> sub <- df[(df$x >5 & y<5), ]; sub
[1] new.name alpha beta <NA> <NA> <NA> <NA> <NA>
<0 rows> (or 0-length row.names)
> sub <- subset(df, x>5 & y<5); sub
[1] new.name alpha beta <NA> NA.1 NA.2 NA.3 NA.4
<0 rows> (or 0-length row.names)
> notLastRow <- head(df, -1); notLastRow
new.name alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
> df[-nrow(df),]
new.name alpha beta NA NA NA NA NA
1 a 1 a NA 1 1.818182 b 1
2 a 2 b NA 2 3.636364 b 4
3 a 3 c NA 3 5.454545 b 9
4 a 4 d NA 4 7.272727 b 16
5 a 5 e NA 5 9.090909 b 25
6 a 6 f NA 6 10.909091 b 36
7 a 7 g NA 7 12.727273 b 49
8 a 8 h NA 8 14.545455 b 64
9 a 9 i NA 9 16.363636 b 81
Selecting multiple columns
> df <- df[,c(1,2,3,4,5)]; df
col x cats newCol y
1 a 1 a NA 1
2 a 2 b NA 2
3 a 3 c NA 3
4 a 4 d NA 4
5 a 5 e NA 5
6 a 6 f NA 6
7 a 7 g NA 7
8 a 8 h NA 8
9 a 9 i NA 9
10 a 10 j NA 10
> names(df) <- c('col1', 'col2', 'col3')
> df <- df[,c('col1','col2')];df
col1 col2
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 a 7
8 a 8
9 a 9
10 a 10
df <- df[,-1]; df
# drop col1 and col3
df <- df[,-c(1,3)]
could not find function "colnmaes"
> df <- df[,!(colnames(df) %in% c('notThis','norThis'))]
> df
col1 col2
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 a 7
8 a 8
9 a 9
10 a 10
Replace column elements by row selection
> df
col1 col2
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 a 7
8 a 8
9 a 9
10 a 10
> df[df$col31 == 'a', 'col2'] <- 1
> df
col1 col2
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 a 7
8 a 8
9 a 9
10 a 10
> df[df$col1 == 'a', 'col2'] <- 1
> df
col1 col2
1 a 1
2 a 1
3 a 1
4 a 1
5 a 1
6 a 1
7 a 1
8 a 1
9 a 1
10 a 1
Missing data(NA)
# detect anywhere in df
> any(is.na(df))
[1] TRUE
> # anywhere in col
> any(is.na(df$newCol))
[1] FALSE
> # deleting selecting missing row
> df2 <- df[!is.na(df$newCol),]; df2
col1 col2 newCol col
1 a NA 0 0
2 a NA 0 0
3 a NA 0 0
4 a NA 0 0
5 a NA 0 0
6 a NA 0 0
7 a NA 0 0
8 a NA 0 0
9 a NA 0 0
10 a NA 0 0
> # replacing NAs with somthing else
> df[is.na(df)] <- 0; df
col1 col2 newCol col
1 a 0 0 0
2 a 0 0 0
3 a 0 0 0
4 a 0 0 0
5 a 0 0 0
6 a 0 0 0
7 a 0 0 0
8 a 0 0 0
9 a 0 0 0
10 a 0 0 0
> df$col[is.na(df$col2)] <- 0; df
col1 col2 newCol col
1 a 0 0 0
2 a 0 0 0
3 a 0 0 0
4 a 0 0 0
5 a 0 0 0
6 a 0 0 0
7 a 0 0 0
8 a 0 0 0
9 a 0 0 0
10 a 0 0 0
> df$col2 <- ifelse(is.na(df$col2), 0, df$col); df
col1 col2 newCol col
1 a 0 0 0
2 a 0 0 0
3 a 0 0 0
4 a 0 0 0
5 a 0 0 0
6 a 0 0 0
7 a 0 0 0
8 a 0 0 0
9 a 0 0 0
10 a 0 0 0
df <- orig[!is.na(orig$series), c('Date, series')]
Traps
1 for loops on possibly empty df's, use: for( in in seq_len(nrow(df))
2 columns coerced to factors, avoid with the argument stringsAsFactor=FALSE
3 confusing row numbers and rows with numbered names(hint: avoid row names)
4 although rbind() accepts vectors and lists; this can fail with factor cols
Wednesday, November 4, 2015
R Basics 4 - Lists
Context: R has two types of vector
Atomic vectors contain values
These values are all of the same type.
They are arranged contiguously.
Atomic vectors cannot contain objects.
There are six types of atomic vector: raw, logical, integer, numeic, complex and character.
Recursive vectors contain objects
R has two types of recursive vector: : list, expression.
Lists
- At top level: 1-dim indexed object that contains objects (not values)
- Indexed from 1 to length(list)
- Contents can be of different types
- Lists can contain the NULL object
- Deeply nested listed of lists possible
- Can be arbitrarily extended (not fixed)
List creation: usually using list()
> l1 <- list('cat', 5, 1:10, FALSE);l1
[[1]]
[1] "cat"
[[2]]
[1] 5
[[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[4]]
[1] FALSE
> l2 <- list ( x='dog', y=5+2i, z=3:8 );l2
$x
[1] "dog"
$y
[1] 5+2i
$z
[1] 3 4 5 6 7 8
> l3 <- c(l1, l2);l3
[[1]]
[1] "cat"
[[2]]
[1] 5
[[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[4]]
[1] FALSE
$x
[1] "dog"
$y
[1] 5+2i
$z
[1] 3 4 5 6 7 8
> l4 <- list(l1, l2);l4
[[1]]
[[1]][[1]]
[1] "cat"
[[1]][[2]]
[1] 5
[[1]][[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[1]][[4]]
[1] FALSE
[[2]]
[[2]]$x
[1] "dog"
[[2]]$y
[1] 5+2i
[[2]]$z
[1] 3 4 5 6 7 8
> l5 <- as.list( c(1,2,3));l5
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> origL <- l4
> inserVorL <- l5
> position <- 3
> l6 <- append(origL, inserVorL, position);l6
[[1]]
[[1]][[1]]
[1] "cat"
[[1]][[2]]
[1] 5
[[1]][[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[1]][[4]]
[1] FALSE
[[2]]
[[2]]$x
[1] "dog"
[[2]]$y
[1] 5+2i
[[2]]$z
[1] 3 4 5 6 7 8
[[3]]
[1] 1
[[4]]
[1] 2
[[5]]
[1] 3
Basic information about lists
> dim(l)
NULL
> is.list(l)
[1] TRUE
> is.vector(l)
[1] TRUE
> is.recursive(l)
[1] TRUE
> is.atomic(l)
[1] FALSE
> is.factor(l)
[1] FALSE
> length(l)
[1] 5
> names(l)
NULL
> mode(l)
[1] "list"
> class(l)
[1] "list"
> typeof(l)
[1] "list"
> attributes(l)
NULL
The contents of a list
> print(l)
[[1]]
[[1]][[1]]
[1] "cat"
[[1]][[2]]
[1] 5
[[1]][[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[1]][[4]]
[1] FALSE
[[2]]
[[2]]$x
[1] "dog"
[[2]]$y
[1] 5+2i
[[2]]$z
[1] 3 4 5 6 7 8
[[3]]
[1] 1
[[4]]
[1] 2
[[5]]
[1] 3
> str(l)
List of 5
$ :List of 4
..$ : chr "cat"
..$ : num 5
..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ : logi FALSE
$ :List of 3
..$ x: chr "dog"
..$ y: cplx 5+2i
..$ z: int [1:6] 3 4 5 6 7 8
$ : num 1
$ : num 2
$ : num 3
> dput(l)
list(list("cat", 5, 1:10, FALSE), structure(list(x = "dog", y = 5+2i,
z = 3:8), .Names = c("x", "y", "z")), 1, 2, 3)
> head(l)
[[1]]
[[1]][[1]]
[1] "cat"
[[1]][[2]]
[1] 5
[[1]][[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[1]][[4]]
[1] FALSE
[[2]]
[[2]]$x
[1] "dog"
[[2]]$y
[1] 5+2i
[[2]]$z
[1] 3 4 5 6 7 8
[[3]]
[1] 1
[[4]]
[1] 2
[[5]]
[1] 3
> tail(l)
[[1]]
[[1]][[1]]
[1] "cat"
[[1]][[2]]
[1] 5
[[1]][[3]]
[1] 1 2 3 4 5 6 7 8 9 10
[[1]][[4]]
[1] FALSE
[[2]]
[[2]]$x
[1] "dog"
[[2]]$y
[1] 5+2i
[[2]]$z
[1] 3 4 5 6 7 8
[[3]]
[1] 1
[[4]]
[1] 2
[[5]]
[1] 3
Trap: cat(x) does not work with lists
Indexing [ versus [[ versus $
- Use [ to get/set multiple items at once
Note: [ always returns a list
- Use [[ and $ to get/set a specific item
- $ only works with named list items
all same: $name $"name" $'name' $`name`
- indexed by positive numbers: these ones
- indexed by negative numbers: not these
- indexed by logical atomic vector: in/out
an empty index l[] returns the list
Tip: When using lists, most of the time you wnat ot index with [[ or $; and avoid [
Indexing examples: one-dimension get
> j <- list(a='cat', b=5, c=FALSE)
> x <- j$a;x
[1] "cat"
> x <- j[['a']];x
[1] "cat"
> x <- j['a'];x
$a
[1] "cat"
> x <- j[[1]];x
[1] "cat"
> x <- j[1];x
$a
[1] "cat"
Indexing examples: set operations
- Start with example data
l <- list(x='a', y='b', z='c', t='d')
- Next use [[ or $ because specific selection
> l[[6]] <- 'new';
> names(l)[5] <- 'w'
> l$w <- 'new-W'
> l[['w']] <- 'dog'
- Change named values: note order ignored
> l[names(l) %in% c('t', 'x')] <- c(1,2)
> l
$x
[1] 1
$y
[1] "b"
$z
[1] "c"
$t
[1] 2
$w
[1] "dog"
[[6]]
[1] "new"
Indexing example: multi-dimension get
- Indexing evaluated from left to right
- Let's start with some example data...
> i <- c('aa', 'bb', 'cc')
> j <- list(a='cat', b=5, c=FALSE)
> k <- list(i, j);k
[[1]]
[1] "aa" "bb" "cc"
[[2]]
[[2]]$a
[1] "cat"
[[2]]$b
[1] 5
[[2]]$c
[1] FALSE
> k[[1]]
[1] "aa" "bb" "cc"
> k[[2]]
$a
[1] "cat"
$b
[1] 5
$c
[1] FALSE
> k[1]
[[1]]
[1] "aa" "bb" "cc"
> k[2]
[[1]]
[[1]]$a
[1] "cat"
[[1]]$b
[1] 5
[[1]]$c
[1] FALSE
> x <- k[[1]][[1]];x
[1] "aa"
> x <- k[[1]][[2]];x
[1] "bb"
> x <- k[1][1][1][1][1];x
[[1]]
[1] "aa" "bb" "cc"
> x <- k[1][2];x
[[1]]
NULL
> x <- k[[2]][1];x
$a
[1] "cat"
List manipulation
1 Arithmetic operators cannot be applied to lists (as content types can vary)
2 Use the apply() functions to apply a function of each element in a list:
> x <- list(a=1, b=month.abb, c=letters)
> lapply(x, FUN=length)
$a
[1] 1
$b
[1] 12
$c
[1] 26
> sapply(x, FUN=length)
a b c
1 12 26
y <- list(a=1, b=3, c=3, c=4)
sapply(y, FUN=function=(x,p) x^p, p=2)
sapply(y, FUN=function=(x,p) x^p, p=2:3)
3 Use unlist to convert list ot vector
> unlist(x)
a b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 c1 c2 c3 c4 c5 c6 c7
"1" "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" "a" "b" "c" "d" "e" "f" "g"
c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26
"h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
unlist wont unlist non-atomic
4 Remove NULL objects from a list
> z <- list(a=1:9, b=letters, c=NULL)
> zNoNull <- Filter(Negate(is.null), z)
> zNoNull
$a
[1] 1 2 3 4 5 6 7 8 9
$b
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
5 Use named lists to return multiple values
6 Factor index treated as integer
decode with v[as.character(f)]
Tuesday, November 3, 2015
R Basics 3 - Atomic Vectors
Atomic vectors:
- An object with contiguous, indexed values
- Indexed from 1 to length(vector)
- All values of the same basic atomic type
- Vectors do not have a dimension attribute
- Has a fixed length once created
Six basic atomic types:
- logical
- integer
- numeric
- complex
- character
- raw
No scalars
In R, the basic types are always in a vecotr. Scalars are just length=1 vectors.
Creation (length determined at creation)
#Default value vectors of length=4
#Using the sequence operator
> i <- 1:5; i
[1] 1 2 3 4 5
> j <- 1.4:6.4; j
[1] 1.4 2.4 3.4 4.4 5.4 6.4
> k <- seq(from=0, to =1, by=0.1);k
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
#Using the c() function
> l <- c(TRUE,FALSE); l
[1] TRUE FALSE
> n <- c(1.3, 7, 7/20); n
[1] 1.30 7.00 0.35
> z <- c(1+2i, 2, -3+4i); z
[1] 1+2i 2+0i -3+4i
#Other things
> v1 <- c(a=1,b=2,c=3); v1
a b c
1 2 3
> v2 <- rep(NA, 3); v2
[1] NA NA NA
> v3 <- c(v1, v2); v3
a b c
1 2 3 NA NA NA
> v4 <- append(1:5, 2:10, after=5); v4
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
Conversion
> as.vector(v4)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> as.logical(v4)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> as.integer(v4)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> as.numeric(v4)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> as.character(v4)
[1] "1" "2" "3" "4" "5" "2" "3" "4" "5" "6" "7" "8" "9" "10"
> unlist(l)
[1] TRUE FALSE
Basic information about atomic vectors
> dim(v)
NULL
> is.atomic(v)
[1] TRUE
> is.vector(v)
[1] TRUE
> is.list(v)
[1] FALSE
> is.factor(v)
[1] FALSE
> is.recursive(v)
[1] FALSE
> length(v)
[1] 14
> names(v)
NULL
> mode(v)
[1] "numeric"
> class(v)
[1] "integer"
> typeof(v)
[1] "integer"
> attributes(v)
NULL
> is.numeric(v);
[1] TRUE
> is.character(v);
[1] FALSE
Trap: lists are vectors (but not atomic)
Trap: array/matrix are atomic (not vectors)
Tip: use(is.vector(v) && is.atomic(v)
The content of a vector
> cat(v)
1 2 3 4 5 2 3 4 5 6 7 8 9 10
> print(v)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> str(v)
int [1:14] 1 2 3 4 5 2 3 4 5 6 ...
> dput(v)
c(1L, 2L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L)
> head(v)
[1] 1 2 3 4 5 2
> tail(v)
[1] 5 6 7 8 9 10
Indexing: [ and [[ but not $
-[x] selects a vecor for the cell/range x
-[[x]] selects a length=1 vector for the single cell index x (rarely used)
- $ operator invalid for atomic vectors
Index by positive numbers
> v[c(1,2,3)]
[1] 1 2 3
> v[1:2]
[1] 1 2
> v[[7]]
[1] 3
> v[which(v == 'M')]
integer(0)
Index by negative numbers
> v[-1] #get all but the first element
[1] 2 3 4 5 2 3 4 5 6 7 8 9 10
> v[-length(v)]
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9
> v[-c(1,3,5,7,9)]
[1] 2 4 2 4 6 7 8 9 10
Index by name(only with named vectors)
> names(v)[1:3] = c('alpha', 'beta','z')
> v[['alpha']]
[1] 1
> v[['beta']]
[1] 2
> v[c('alpha','beta')]
alpha beta
1 2
> v[!(names(v) %in% c('c', 'b'))]
alpha beta z <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 3 4 5 2 3 4 5 6 7 8 9 10
Sorting
> upsorted = sort(v); upsorted;
alpha beta <NA> z <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 2 3 3 4 4 5 5 6 7 8 9 10
> v[order(v)]
alpha beta <NA> z <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 2 3 3 4 4 5 5 6 7 8 9 10
> d = sort(v, decreasing = TRUE);d
<NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> z <NA> beta <NA> alpha
10 9 8 7 6 5 5 4 4 3 3 2 2 1
Raw vectors
> s <- charToRaw('raw');
> r <- as.raw(c(114,97,119))
> print(r)
[1] 72 61 77
- An object with contiguous, indexed values
- Indexed from 1 to length(vector)
- All values of the same basic atomic type
- Vectors do not have a dimension attribute
- Has a fixed length once created
Six basic atomic types:
- logical
- integer
- numeric
- complex
- character
- raw
No scalars
In R, the basic types are always in a vecotr. Scalars are just length=1 vectors.
Creation (length determined at creation)
#Default value vectors of length=4
> u <- vector(mode='logical', length=4)
> print(u)
[1] FALSE FALSE FALSE FALSE
> v <- vector(mode= 'integer', length=4)
> print(v)
[1] 0 0 0 0
#Using the sequence operator
> i <- 1:5; i
[1] 1 2 3 4 5
> j <- 1.4:6.4; j
[1] 1.4 2.4 3.4 4.4 5.4 6.4
> k <- seq(from=0, to =1, by=0.1);k
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
#Using the c() function
> l <- c(TRUE,FALSE); l
[1] TRUE FALSE
> n <- c(1.3, 7, 7/20); n
[1] 1.30 7.00 0.35
> z <- c(1+2i, 2, -3+4i); z
[1] 1+2i 2+0i -3+4i
#Other things
> v1 <- c(a=1,b=2,c=3); v1
a b c
1 2 3
> v2 <- rep(NA, 3); v2
[1] NA NA NA
> v3 <- c(v1, v2); v3
a b c
1 2 3 NA NA NA
> v4 <- append(1:5, 2:10, after=5); v4
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
Conversion
> as.vector(v4)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> as.logical(v4)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> as.integer(v4)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> as.numeric(v4)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> as.character(v4)
[1] "1" "2" "3" "4" "5" "2" "3" "4" "5" "6" "7" "8" "9" "10"
> unlist(l)
[1] TRUE FALSE
Basic information about atomic vectors
> dim(v)
NULL
> is.atomic(v)
[1] TRUE
> is.vector(v)
[1] TRUE
> is.list(v)
[1] FALSE
> is.factor(v)
[1] FALSE
> is.recursive(v)
[1] FALSE
> length(v)
[1] 14
> names(v)
NULL
> mode(v)
[1] "numeric"
> class(v)
[1] "integer"
> typeof(v)
[1] "integer"
> attributes(v)
NULL
> is.numeric(v);
[1] TRUE
> is.character(v);
[1] FALSE
Trap: lists are vectors (but not atomic)
Trap: array/matrix are atomic (not vectors)
Tip: use(is.vector(v) && is.atomic(v)
The content of a vector
> cat(v)
1 2 3 4 5 2 3 4 5 6 7 8 9 10
> print(v)
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9 10
> str(v)
int [1:14] 1 2 3 4 5 2 3 4 5 6 ...
> dput(v)
c(1L, 2L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L)
> head(v)
[1] 1 2 3 4 5 2
> tail(v)
[1] 5 6 7 8 9 10
Indexing: [ and [[ but not $
-[x] selects a vecor for the cell/range x
-[[x]] selects a length=1 vector for the single cell index x (rarely used)
- $ operator invalid for atomic vectors
Index by positive numbers
> v[c(1,2,3)]
[1] 1 2 3
> v[1:2]
[1] 1 2
> v[[7]]
[1] 3
> v[which(v == 'M')]
integer(0)
Index by negative numbers
> v[-1] #get all but the first element
[1] 2 3 4 5 2 3 4 5 6 7 8 9 10
> v[-length(v)]
[1] 1 2 3 4 5 2 3 4 5 6 7 8 9
> v[-c(1,3,5,7,9)]
[1] 2 4 2 4 6 7 8 9 10
Index by name(only with named vectors)
> names(v)[1:3] = c('alpha', 'beta','z')
> v[['alpha']]
[1] 1
> v[['beta']]
[1] 2
> v[c('alpha','beta')]
alpha beta
1 2
> v[!(names(v) %in% c('c', 'b'))]
alpha beta z <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 3 4 5 2 3 4 5 6 7 8 9 10
Sorting
> upsorted = sort(v); upsorted;
alpha beta <NA> z <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 2 3 3 4 4 5 5 6 7 8 9 10
> v[order(v)]
alpha beta <NA> z <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 2 3 3 4 4 5 5 6 7 8 9 10
> d = sort(v, decreasing = TRUE);d
<NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> z <NA> beta <NA> alpha
10 9 8 7 6 5 5 4 4 3 3 2 2 1
Raw vectors
> s <- charToRaw('raw');
> r <- as.raw(c(114,97,119))
> print(r)
[1] 72 61 77
Subscribe to:
Posts (Atom)