Wednesday, November 11, 2015

R Basics 9 - Writing Functions

Functions in R are called closures.
# Don't be deceived by the curly brackets;
# R is much more like Lisp than C or Java.
# Defining problems in the terms of function
# calls and their lazy, delayed evaluation
# (variable resolution) is R's big feature.

Standard form (for named functions)
plus <- function(x, y) {x+y}
plus(5,6)
# return() not needed - last value returned
# Optional curly brackets with 1-line fns:
x.to.y <- function(x,y) return (x^y)

Returning values
# return() - can use to aid readability and fro exit part way trhrough a function
# invisible() - return values thant do not print if not assigned.
# Traps: return() is a function, not a statement. The brackets are needed.

Anonymous fucntions
# Often used in arguments to fucntions:
v <- 1:9;
cube <- sapply(v, function(x) x^3)

Arguments are passed by value
# Effectively arguments are copied, and any changes made to the argument within the function do not affect the caller's copy.
# Trap: arguments are not typed and your function could be passed anything!
# Upfront argument checking advised!

Arguments passed by position or name
b <- function(cat, dog, cow) cat+dog+cow
b(1,2,3)
b(cow=3, cat =1, dog=2)
# Trap: not all arguments need to passed
f <- funciton(x) missing(x); f(); f('here')
# match.arg() - argument partial matching

Default arguments
# Default arguments can be specified 
x2y.1 <- function(x, y=2) {x^y}
x2y.2 <- function(x, y=x) {x^y}
x2y.2(3)
x2y.2(2,3)

The dots argument (...) is a catch - all
f <- function(...) {
# simle way to access dots arguments
dots <- list()
}
x <- f(5);
dput(x)
g <- function(...) {
dots <- substitute(list(,,,))[-1]
dots.names <- sapply(dots, deparse)
}

x <- g(a,b,c)
dput(x) -> c("a", "b", "c")
# dots can be passed to another function:
h <- function(x, ...) g(...)
x <- h(a, b, c);

Function environment
# When a function is called a new environment (frame) is created for it.
# There frames are found in the call stack. Fist frame is the global environment 
# Next Function reaches back into the call stack.
called.by <- function() {
# returns string 
if(length(sys.parents()) <=2) return('.GlobalEnv')
deparse(sys.call(sys.parent(2)))
}
g <- function(...) { called.by() }
f <- fucntion(...) g(...);
f(a,2)

Variable scope and unbound variables
# Within a function, variables are resoved in the local frame first, 
# then in terms of super-functions (when a functions defined inside a function), then in terms of the global environment.
h <- fucntion(x) { x+a }
a <- 5
h(5)
k <- function(x) { a<- 100; h(x) }
k(10)

Super assignment
# x <<- y ignores the local x, and looks up the super-environments for a x to replace 
accumulator <- fucntion() {
a <- 0
function(x) {
a <<- a +x
 }
}
acc <- accumulator()
acc(1)
acc(5)
acc(2)

Operator and replacement functions
`+`(4,5) # -> 9 
- operators are just fns
`%plus%` <- function(a,b) {a+b}
# "FUN(x) <- v is parsed as: x <- FUN(x, v)
"cap<-" <- function(x, value) 
ifelse(x>value, value, x)
x <- c(1,10,100);
cap(x) <- 9

Exeptions
tryCatch(print('pass'), error=fucntion(e) print('bad'), finally=print('done'))
tryCatch(stop('fail'), error=function(e) print('bad'), finally=print('done'))

Useful language reflection functions
exists(); 
get();
assign() - for variabels
substitute()
bquote()
eval()
do.call()
parse()
deparse()
quote()
enquote()

Thursday, November 5, 2015

R Basics 8 - Tips and Traps

General
Trap: R error message are not helpful
Tip: use traceback() to understand errors

Object coercion
Trap: R objects are often silently coerced to another class/type as/when needed.
Examples: 
c(1, TURE)
Tip: inspect objects with 
str(x)
model(x)
class(x)
typeof(x)
dput(x)
attributeds(x)

Factors(special case of coercion)
Trap: Factors cause more bug-hunting grief than just about anything else in R, especially when strig and integer vectors and data.frame cols are coerced to factor.
Tip: learn about factor and using them
Tip: explicitly test with is.factor(df$col)
Tip: use stringsAsFactors=FALSE argument when you create a data frame from file

Trap: maths doesn't work on numeric factors and they are tricky to convert back.
Tip: try as.numeric(as.character(factor))

Trap: appending rows to a data frame with factor columns is tricky.
Tip: make sure the row to be append is a presented to rbind() as a data.frame, and not as a vector or a list (which works sometimes)

Trap: the combine function c() will let you combine different factors into a vecotr of integer codes (probably garbage).
Tip: convert factors to string or integers (as appropriate) before combining.

Garbage in the workspace
Trap: R saves your workspace at the end of each session and reloads the saved workspace at the start of the next session.
Before you know it, you can have heaps of variables lurking in your workspace that are impacting on your calculations.
Tip: use ls() to check on lurking variables
Tip: clean up with rm(list=ls(all=TRUE))
Tip: library() to check on loaded packages
Tip: avoid savign workspaces, start R with the --no-save -- no-restore arguments

The 1:0 sequence in for-loops
Trap: for(x in 1:length(y)) fails on the zero length vector. IT will loop twice: first setting x to 1, then to 0.
Tip: use for(s in seq_len(y))
Trap: for ( x in y)
Tip: for x( in seq_along(y))

Space out your code and use brackets
Trap: x<-5
Tip: x<- -5
Trap: 1:n-1
Tip: 1:(n-1)
Trap: 2^2:9
Tip: 2^(2:9)

Vectors and vector recycling
Trap: most objects in R are vectors. R does not have scalars (jsut length=1 vectors).
Many Fns work on entire vectors at one.

Tip: In R, for-loop are often the inefficient and inelegant solution. Take the time to learn the various 'apply' family of functions and plyr package. 
Trap: Math with different length vectors will work with the shortest vector recycled
c(1,2,3) + c(10,20)

Vectors need the c() operator
wrong: mean(1,2,3)
correct: mean(c(1,2,3))

Use the correct Boolean operator
Tip: | and & are vectorise - use ifelse() (| and & also used with indexes to subset)
Tip: || and && are not vectorised - use if
Trap: || && lazy evaluation, | and & full evaluation
Trap: == (Boolean equality) = (assignment)

Equality testing with numbers
Trap: == and != test for near in/equality
as.double(8) = as.integer(8)
isTRUE(all.equal(x,y)) tests near equality
Tip: identical(x,y) is more fussy

Think hard about NA, NaN and NULL
Trap: NA and NaN are valid values.
Trap: many Fns fail by default on NA input
Tip: many functions take na.rm=TRUE
Tip: vector test for NA
Trap: x == NA is not the same as is.na(x)
Trap: x == NULL not he same as is.null(x)
Trap: is.numeic(NaN) returns TRUE

Indexing ([], [[]]. $)
Tip: Objects are indexed from 1 to N.
Trap: many subtle differences in indexing for vectors, lists, matrices, arrays and data.frames.
Return types vary depending on object being indexed and idexation method.
Tip: take the time to learn the differences
Trap: the zero-index fails silently
Trap: negative indexes return all but those
Trap: NA is a valid Boolean index
Trap: mismatched Boolean indexes work

Coding Practice
Tip: liberally use stopifnot() on function entry to verify argument validity
Tip: <- for assignment; = for list names

R Basics 7 - Factors

Factors
- A one-dimensional array of categorical (unordered) or ordinal (ordered) data.
- Indexed from 1 to N. Not fixed length.
- Named factors are possible (but rare).
- The hidden/unexpected coercion of object of a factor is a key source of bugs.

Why use Factors
- Specifying a non-alphabetical order
- Some statistical functions treat cat/ordinal data differently from continuous data.
- Deep ggplot2 code depends on it

Create 
Example 1 - unordered
> sex.v <- c('M', 'F', 'F', 'M', 'F'); sex.v
[1] "M" "F" "F" "M" "F"
> sex.f <- factor(sex.v); sex.f
[1] M F F M F
Levels: F M
> sex.w <- as.character(sex.f); sex.w
[1] "M" "F" "F" "M" "F"

Example 2 - ordered (small, medium, large)
> size.v <- c('S','L', 'M', 'L', 'S', 'M'); size.v
[1] "S" "L" "M" "L" "S" "M"
> size1.f <- factor(size.v, ordered = TRUE); size1.f
[1] S L M L S M
Levels: L < M < S
Example 3 - ordered, where we set the order
> size.lvls <- c('S','M','L')
> size2.f <- factor(size.v, levels=size.lvls); size2.f
[1] S L M L S M
Levels: S M L
 
Example 4 - ordered with levels and labels
> levels <- c(1,2,3,99)
> labels <- c('love', 'neutral', 'hate', NA); labels
[1] "love"    "neutral" "hate"    NA       
> data.v <- c(1,2,3,99,1,2,1,2,99); data.v
[1]  1  2  3 99  1  2  1  2 99
> data.f <- factor(data.v, levels=levels, labels=labels); data.f
[1] love    neutral hate    <NA>    love    neutral love    neutral <NA>   
Levels: love neutral hate <NA>
Example 5 - using the cut function to group
> i <- 1:50 + rnorm(50,0,5)
> k <- cut(i,5); k
 [1] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (10.2,21.1]  
 [9] (-0.859,10.2] (10.2,21.1]   (-0.859,10.2] (-0.859,10.2] (-0.859,10.2] (10.2,21.1]   (-0.859,10.2] (10.2,21.1]  
[17] (10.2,21.1]   (21.1,32.1]   (21.1,32.1]   (10.2,21.1]   (-0.859,10.2] (10.2,21.1]   (10.2,21.1]   (21.1,32.1]  
[25] (21.1,32.1]   (10.2,21.1]   (32.1,43]     (21.1,32.1]   (21.1,32.1]   (21.1,32.1]   (32.1,43]     (32.1,43]    
[33] (32.1,43]     (21.1,32.1]   (21.1,32.1]   (32.1,43]     (32.1,43]     (43,54.1]     (32.1,43]     (43,54.1]    
[41] (32.1,43]     (32.1,43]     (32.1,43]     (43,54.1]     (43,54.1]     (43,54.1]     (43,54.1]     (43,54.1]    
[49] (43,54.1]     (43,54.1]    
Levels: (-0.859,10.2] (10.2,21.1] (21.1,32.1] (32.1,43] (43,54.1]
> 

Basic information about a factor
> dim(f)
NULL
> is.factor(f)
[1] TRUE
> is.atomic(f)
[1] TRUE
> is.vector(f)
[1] FALSE
> is.list(f)
[1] FALSE
> is.recursive(f)
[1] FALSE
> length(f)
[1] 24
> names(f)
NULL
> mode(f)
[1] "numeric"
> class(f)
[1] "factor"
> typeof(f)
[1] "integer"
> is.ordered(f)
[1] FALSE
> unclass(f)
 [1] 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1
attr(,"levels")
[1] "4" "3" "2" "1"
> cat(f)
4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1
> print(f)
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Levels: 4 3 2 1
> str(f)
 Factor w/ 4 levels "4","3","2","1": 4 3 2 1 4 3 2 1 4 3 ...
> dput(f)
structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 
3L, 2L, 1L, 4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L), .Label = c("4", 
"3", "2", "1"), class = "factor")
> head(f)
[1] 1 2 3 4 1 2
Levels: 4 3 2 1

Indexing: much like atomic vectors
- [x] selects a factor for the cell/range x
- [[x]] selects a length=1 factor fro the single cell index x (rarely used)
- The $ operator is invalid with actors

Factor arithmetic & Boolean comparisons
- factors cannot be added, multiple, etc.
- same-type factors are equality testable
> x <- sex.f[1] == sex.f[2];x
[1] FALSE
-- order factors can be order compared
> z <- size1.f[1] < size1.f[2]; z
[1] FALSE
Managing the enumeration (levels) > f <- factor(letters[1:3]);f [1] a b c Levels: a b c > levels(f) [1] "a" "b" "c" > levels(f)[1] [1] "a" > any(levels(f) %in% c('a', 'b', 'c')) [1] TRUE # add new levels > levels(f)[length(levels(f))+1] <-'z'; f [1] a b c Levels: a b c z AA > levels(f) <- c(levels(f), 'AA');f [1] a b c Levels: a b c z AA # reorder levels > f <- factor(f, levels(f)[c(4,1:3,5)]);f [1] xx b c Levels: c z xx b BB # change/rename levels > levels(f)[1] <- 'xx';f [1] xx b c Levels: xx b c z BB > levels(f)[levels(f) %in% 'AA'] <- 'BB';f [1] xx b c Levels: xx b c z BB # delete(drop) unused levels > f <- f[drop=TRUE] > f [1] xx b c Levels: c xx b Adding an element to a factor > f <- factor(letters[1:10]); f [1] a b c d e f g h i j Levels: a b c d e f g h i j > f[length(f) + 1] <- 'a'; f [1] a b c d e f g h i j a Levels: a b c d e f g h i j > f <- factor(c(as.character(f), 'zz')); f [1] a b c d e f g h i j a zz Levels: a b c d e f g h i j zz Merging/combining factors > a <- factor(1:10);a [1] 1 2 3 4 5 6 7 8 9 10 Levels: 1 2 3 4 5 6 7 8 9 10 > b <- factor(letters[a]);b [1] a b c d e f g h i j Levels: a b c d e f g h i j > union <- factor(c(as.character(a), as.character(b))); union [1] 1 2 3 4 5 6 7 8 9 10 a b c d e f g h i j Levels: 1 10 2 3 4 5 6 7 8 9 a b c d e f g h i j > cross <- interaction(a,b); cross [1] 1.a 2.b 3.c 4.d 5.e 6.f 7.g 8.h 9.i 10.j 100 Levels: 1.a 2.a 3.a 4.a 5.a 6.a 7.a 8.a 9.a 10.a 1.b 2.b 3.b 4.b 5.b 6.b 7.b 8.b 9.b 10.b 1.c 2.c 3.c 4.c 5.c ... 10.j
Using factors within data frames
df$x <- reorder(df$f, df$x, F, order=T)
by(df$x, df$f, F)
Traps
1 Strings loaded from a file converted to factors (read.table or read.csv stringASFactors=FALSE)
2 Numbers from a file factorised. as.numeric(levels(f))[as.integer(f)]
3 One factor (enumeration) cannot be meaningfully compared with another
4 NA's missing data in factors and levels can cause problems
5 Adding a row to a data frame, which adds a new level to a column factor.

R Basics 6 - Matrices and Arrays

Context

Matices and arrays are an extension on R's atomic vecotrs.
Atomic vectors contain values (not objects).
They hold a contiguous selt of values, all of which are the same basic type. There are six types of atomic vecotr: logical, integer, numeric, complex, caracter and raw.

Importantly: atomic vectors have no dimension attribute.
Matrices and arrays are effectively vectors with a dimension attribute. 
Matrices are two-dimensional(tabular) objects, containing values all of the same type (unlike data frames).
Arrays are multi-dimensional objects(typically with three plus dimensions), with values all of the same type.

Matrix versus data.frame
In a matrix, every column, and every cell is of the same basic atomic type. 
In a data.frame each column can be of a different type(eg. numeric, character, factor). Data frames are best with messy data, and for variables of mixed models.

Matrix creation
## generalCase <- matrix(data=NA, nrow=1, ncol=1, byrow=FALSE, dimnames=NULL)
> M <- matrix(c(2,1,3,4,5,6), nrow=3, byrow=TRUE); M
     [,1] [,2]
[1,]    2    1
[2,]    3    4
[3,]    5    6
> b <- matrix(c(0, -1, 4)); b
     [,1]
[1,]    0
[2,]   -1
[3,]    4
> I <- diag(3); I
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
> D <- diag(c(1,2,3)); D
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3
> d <- diag(M); d
[1] 2 4

Basic information about a matrix
> dim(M)
[1] 3 2
> class(M)
[1] "matrix"
> is.matrix(M)
[1] TRUE
> is.array(M)
[1] TRUE
> is.atomic(M)
[1] TRUE
> is.vector(M)
[1] FALSE
> is.list(M)
[1] FALSE
> is.factor(M)
[1] FALSE
> is.recursive(M)
[1] FALSE
> nrow(M)
[1] 3
> ncol(M)
[1] 2
> length(M)
[1] 6
> rownames(M)
NULL
> colnames(M)
NULL

Matrix manipulation
> M <- matrix(c(2,1,3,4,5,6), nrow=3, byrow=TRUE); M
     [,1] [,2]
[1,]    2    1
[2,]    3    4
[3,]    5    6
> N <- matrix(c(6,5,4,3,2,1), nrow=3, byrow=TRUE); N
     [,1] [,2]
[1,]    6    5
[2,]    4    3
[3,]    2    1
> newM <- cbind(M, N); newM
     [,1] [,2] [,3] [,4]
[1,]    2    1    6    5
[2,]    3    4    4    3
[3,]    5    6    2    1
> newM <- rbind(M, N); newM
     [,1] [,2]
[1,]    2    1
[2,]    3    4
[3,]    5    6
[4,]    6    5
[5,]    4    3
[6,]    2    1
> v <- c(M); v
[1] 2 3 5 1 4 6
> df <- data.frame(M); df
  X1 X2
1  2  1
2  3  4
3  5  6

Matrix multiplication
> M
     [,1] [,2]
[1,]    2    1
[2,]    3    4
[3,]    5    6
> N
     [,1] [,2]
[1,]    6    5
[2,]    4    3
[3,]    2    1
> InnerProduct <- M %*% t(N); InnerProduct
     [,1] [,2] [,3]
[1,]   17   11    5
[2,]   38   24   10
[3,]   60   38   16
> OuterProduct <- M %o% N; OuterProduct
, , 1, 1

     [,1] [,2]
[1,]   12    6
[2,]   18   24
[3,]   30   36

, , 2, 1

     [,1] [,2]
[1,]    8    4
[2,]   12   16
[3,]   20   24

, , 3, 1

     [,1] [,2]
[1,]    4    2
[2,]    6    8
[3,]   10   12

, , 1, 2

     [,1] [,2]
[1,]   10    5
[2,]   15   20
[3,]   25   30

, , 2, 2

     [,1] [,2]
[1,]    6    3
[2,]    9   12
[3,]   15   18

, , 3, 2

     [,1] [,2]
[1,]    2    1
[2,]    3    4
[3,]    5    6

> CrossProduct <- crossprod(M, N); CrossProduct
     [,1] [,2]
[1,]   34   24
[2,]   34   23
> M * N
     [,1] [,2]
[1,]   12    5
[2,]   12   12
[3,]   10    6

Matrix maths
> rowMeans(M)
[1] 1.5 3.5 5.5
> colMeans(M)
[1] 3.333333 3.666667
> rowSums(M)
[1]  3  7 11
> colSums(M)
[1] 10 11
> t <- t(M);t
     [,1] [,2] [,3]
[1,]    2    3    5
[2,]    1    4    6
> inverse <- solve(diag(c(1,2,3))); inverse
     [,1] [,2]      [,3]
[1,]    1  0.0 0.0000000
[2,]    0  0.5 0.0000000
[3,]    0  0.0 0.3333333
> e <- eigen(diag(c(1,2,3))); e
$values
[1] 3 2 1

$vectors
     [,1] [,2] [,3]
[1,]    0    0    1
[2,]    0    1    0
[3,]    1    0    0

> d <- det(diag(c(1,2,3))); d
[1] 6

Matrix indexing [row, col] [[row, col]]
# [[ for single cell selection; [ for multi cell selection
# indexed by positive numbers: these ones
# indexed by negative numbers: not these
# indexed by logical atomic vector: in/out
# named rows/cols can be indexed by name
# M[i] or M[[i]] is vector-like indexing
# $ operator is invalid for atomic vectors
# M[r,]
# M[,c]

Arrays
A three dimensional array created in two steps:
> A <- 1:8;A
[1] 1 2 3 4 5 6 7 8
> dim(A) <- c(2,2,2);A
, , 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

     [,1] [,2]
[1,]    5    7
[2,]    6    8
A matrix is a special case of array. Matrices are arrays with two dimensions.
> M <- array(1:9, dim=c(3,3));M
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

R Basics 5 - Data Frames

Create data frame
- The R way of doing spreadsheets
- Internally, a data.frame is a list of equal length vectors or factors.
- Observations in rows; Variables in cols
empty <-data.frame()
> empty <-data.frame()
> c1 <- 1:10
> c2 <- letters[1:10]
> df <- data.frame(col1=c1, col2=c2)
> df
   col1 col2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

Import from and export to file
d2 <- read.csv('fileName.csv', header = TRUE)
library(gdata);
d2 <- read.xls('file.xls')
write.csv(df, file='fileName.csv')
print(xtable(df), type='html')

Basic infomrmation about the data frame
> is.data.frame(df)
[1] TRUE
> class(df)
[1] "data.frame"
> nrow(df)
[1] 10
> ncol(df)
[1] 2
> colnames(df);
[1] "col1" "col2"
> rownames(df);
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Referencing cells [row, col] [[r, c]]
## [[ for single cell selection;
# [ for multi cell selection;
> vec <- df[[5,2]]; vec
[1] e
Levels: a b c d e f g h i j
> newDF <- df[1:5, 1:2]; newDF
  col1 col2
1    1    a
2    2    b
3    3    c
4    4    d
5    5    e
> df[[2, 'col1']]
[1] 2
> df[3:5, c('col1', 'col2')]
  col1 col2
3    3    c
4    4    d
5    5    e

Referencing rows [r, ]
# returns a data frame ( and not a vecotr! )
> row.1 <- df[1,]; row.1
  col1 col2
1    1    a
> row.n <- df[nrow(df),]; row.n
   col1 col2
10   10    j
> vrow <- as.numeric(as.vector(df[1,])); vrow
[1] 1 1
> vrow <- as.character(as.vector(df[1,])); vrow
[1] "1" "1"

Referencing columns [,c] [d] [[d]] $col
> names(df) <- c('num','cats')
> col.vec <- df$cats; col.vec
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # returns vector
> col.vec <- df[, 'cats'] ; col.vec
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # a is int or string
> col.vec <- df[ , 2]; col.vec
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # returns a vector
> col.vec <- df[['cats']]; col.vec
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j
> # returns 1 col df
> frog.df <- df['cats']
> # returns 1 col df
> first.df <- df[1]; first.df
   num
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
> first.col <- df[,1]; first.col
 [1]  1  2  3  4  5  6  7  8  9 10
> # returns a vector
> last.col <- df[,ncol(df)]; last.col
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j

Adding rows
# The right way ... (both args are DFs)
df <- rbind(df, data.frame(num=1, cats='A')); df

Adding columns
> df$newCol <- rep(NA, nrow(df)); df
   col1 col2 newCol
1     1    a     NA
2     2    b     NA
3     3    c     NA
4     4    d     NA
5     5    e     NA
6     6    f     NA
7     7    g     NA
8     8    h     NA
9     9    i     NA
10   10    j     NA
> #Copy a column
> df[, 'copyofCol'] <- 1:nrow(df); df
   col1 col2 newCol copyofCol
1     1    a     NA         1
2     2    b     NA         2
3     3    c     NA         3
4     4    d     NA         4
5     5    e     NA         5
6     6    f     NA         6
7     7    g     NA         7
8     8    h     NA         8
9     9    i     NA         9
10   10    j     NA        10
> names(df) <- c('x','cats','newCol','y')
> df$y.percent.pf.x <- df$y/sum(df$x)*100; df
    x cats newCol  y y.percent.pf.x
1   1    a     NA  1       1.818182
2   2    b     NA  2       3.636364
3   3    c     NA  3       5.454545
4   4    d     NA  4       7.272727
5   5    e     NA  5       9.090909
6   6    f     NA  6      10.909091
7   7    g     NA  7      12.727273
8   8    h     NA  8      14.545455
9   9    i     NA  9      16.363636
10 10    j     NA 10      18.181818
> df <-cbind(col=rep('a',nrow(df)), df); df
   col  x cats newCol  y y.percent.pf.x
1    a  1    a     NA  1       1.818182
2    a  2    b     NA  2       3.636364
3    a  3    c     NA  3       5.454545
4    a  4    d     NA  4       7.272727
5    a  5    e     NA  5       9.090909
6    a  6    f     NA  6      10.909091
7    a  7    g     NA  7      12.727273
8    a  8    h     NA  8      14.545455
9    a  9    i     NA  9      16.363636
10   a 10    j     NA 10      18.181818
> df <- cbind(df,col=rep('b',nrow(df))); df
   col  x cats newCol  y y.percent.pf.x col
1    a  1    a     NA  1       1.818182   b
2    a  2    b     NA  2       3.636364   b
3    a  3    c     NA  3       5.454545   b
4    a  4    d     NA  4       7.272727   b
5    a  5    e     NA  5       9.090909   b
6    a  6    f     NA  6      10.909091   b
7    a  7    g     NA  7      12.727273   b
8    a  8    h     NA  8      14.545455   b
9    a  9    i     NA  9      16.363636   b
10   a 10    j     NA 10      18.181818   b
> df$c3 <- with(df, col3 <- x*y); df
   col  x cats newCol  y y.percent.pf.x col  c3
1    a  1    a     NA  1       1.818182   b   1
2    a  2    b     NA  2       3.636364   b   4
3    a  3    c     NA  3       5.454545   b   9
4    a  4    d     NA  4       7.272727   b  16
5    a  5    e     NA  5       9.090909   b  25
6    a  6    f     NA  6      10.909091   b  36
7    a  7    g     NA  7      12.727273   b  49
8    a  8    h     NA  8      14.545455   b  64
9    a  9    i     NA  9      16.363636   b  81
10   a 10    j     NA 10      18.181818   b 100
> transform(df, col4 <- x+y)
   col  x cats newCol  y y.percent.pf.x col  c3
1    a  1    a     NA  1       1.818182   b   1
2    a  2    b     NA  2       3.636364   b   4
3    a  3    c     NA  3       5.454545   b   9
4    a  4    d     NA  4       7.272727   b  16
5    a  5    e     NA  5       9.090909   b  25
6    a  6    f     NA  6      10.909091   b  36
7    a  7    g     NA  7      12.727273   b  49
8    a  8    h     NA  8      14.545455   b  64
9    a  9    i     NA  9      16.363636   b  81
10   a 10    j     NA 10      18.181818   b 100

Set column names # same for rownames()
> colnames(df) <- c('date', 'alpha', 'beta'); df
   date alpha beta NA NA        NA NA  NA
1     a     1    a NA  1  1.818182  b   1
2     a     2    b NA  2  3.636364  b   4
3     a     3    c NA  3  5.454545  b   9
4     a     4    d NA  4  7.272727  b  16
5     a     5    e NA  5  9.090909  b  25
6     a     6    f NA  6 10.909091  b  36
7     a     7    g NA  7 12.727273  b  49
8     a     8    h NA  8 14.545455  b  64
9     a     9    i NA  9 16.363636  b  81
10    a    10    j NA 10 18.181818  b 100
> colnames(df)[1] <- 'new.name'; df
   new.name alpha beta NA NA        NA NA  NA
1         a     1    a NA  1  1.818182  b   1
2         a     2    b NA  2  3.636364  b   4
3         a     3    c NA  3  5.454545  b   9
4         a     4    d NA  4  7.272727  b  16
5         a     5    e NA  5  9.090909  b  25
6         a     6    f NA  6 10.909091  b  36
7         a     7    g NA  7 12.727273  b  49
8         a     8    h NA  8 14.545455  b  64
9         a     9    i NA  9 16.363636  b  81
10        a    10    j NA 10 18.181818  b 100
> colnames(df)[colnames(df) %in% c('a', 'b')] <- c('x', 'y'); df
   new.name alpha beta NA NA        NA NA  NA
1         a     1    a NA  1  1.818182  b   1
2         a     2    b NA  2  3.636364  b   4
3         a     3    c NA  3  5.454545  b   9
4         a     4    d NA  4  7.272727  b  16
5         a     5    e NA  5  9.090909  b  25
6         a     6    f NA  6 10.909091  b  36
7         a     7    g NA  7 12.727273  b  49
8         a     8    h NA  8 14.545455  b  64
9         a     9    i NA  9 16.363636  b  81
10        a    10    j NA 10 18.181818  b 100

Selecting Multiple Rows
> firstTenRows <- df[1:10,]; firstTenRows
   new.name alpha beta NA NA        NA NA  NA
1         a     1    a NA  1  1.818182  b   1
2         a     2    b NA  2  3.636364  b   4
3         a     3    c NA  3  5.454545  b   9
4         a     4    d NA  4  7.272727  b  16
5         a     5    e NA  5  9.090909  b  25
6         a     6    f NA  6 10.909091  b  36
7         a     7    g NA  7 12.727273  b  49
8         a     8    h NA  8 14.545455  b  64
9         a     9    i NA  9 16.363636  b  81
10        a    10    j NA 10 18.181818  b 100
> everthingButRowTwo <- df[-2,]; everthingButRowTwo
   new.name alpha beta NA NA        NA NA  NA
1         a     1    a NA  1  1.818182  b   1
3         a     3    c NA  3  5.454545  b   9
4         a     4    d NA  4  7.272727  b  16
5         a     5    e NA  5  9.090909  b  25
6         a     6    f NA  6 10.909091  b  36
7         a     7    g NA  7 12.727273  b  49
8         a     8    h NA  8 14.545455  b  64
9         a     9    i NA  9 16.363636  b  81
10        a    10    j NA 10 18.181818  b 100
> sub <- df[(df$x >5 & y<5), ]; sub
[1] new.name alpha    beta     <NA>     <NA>     <NA>     <NA>     <NA>  
<0 rows> (or 0-length row.names)
> sub <- subset(df, x>5 & y<5); sub
[1] new.name alpha    beta     <NA>     NA.1     NA.2     NA.3     NA.4  
<0 rows> (or 0-length row.names)
> notLastRow <- head(df, -1); notLastRow
  new.name alpha beta NA NA        NA NA NA
1        a     1    a NA  1  1.818182  b  1
2        a     2    b NA  2  3.636364  b  4
3        a     3    c NA  3  5.454545  b  9
4        a     4    d NA  4  7.272727  b 16
5        a     5    e NA  5  9.090909  b 25
6        a     6    f NA  6 10.909091  b 36
7        a     7    g NA  7 12.727273  b 49
8        a     8    h NA  8 14.545455  b 64
9        a     9    i NA  9 16.363636  b 81
> df[-nrow(df),]
  new.name alpha beta NA NA        NA NA NA
1        a     1    a NA  1  1.818182  b  1
2        a     2    b NA  2  3.636364  b  4
3        a     3    c NA  3  5.454545  b  9
4        a     4    d NA  4  7.272727  b 16
5        a     5    e NA  5  9.090909  b 25
6        a     6    f NA  6 10.909091  b 36
7        a     7    g NA  7 12.727273  b 49
8        a     8    h NA  8 14.545455  b 64
9        a     9    i NA  9 16.363636  b 81

Selecting multiple columns
> df <- df[,c(1,2,3,4,5)]; df
   col  x cats newCol  y
1    a  1    a     NA  1
2    a  2    b     NA  2
3    a  3    c     NA  3
4    a  4    d     NA  4
5    a  5    e     NA  5
6    a  6    f     NA  6
7    a  7    g     NA  7
8    a  8    h     NA  8
9    a  9    i     NA  9
10   a 10    j     NA 10
> names(df) <- c('col1', 'col2', 'col3')
> df <- df[,c('col1','col2')];df
   col1 col2
1     a    1
2     a    2
3     a    3
4     a    4
5     a    5
6     a    6
7     a    7
8     a    8
9     a    9
10    a   10
df <- df[,-1]; df
# drop col1 and col3
df <- df[,-c(1,3)]
  could not find function "colnmaes"
> df <- df[,!(colnames(df) %in% c('notThis','norThis'))]
> df
   col1 col2
1     a    1
2     a    2
3     a    3
4     a    4
5     a    5
6     a    6
7     a    7
8     a    8
9     a    9
10    a   10

Replace column elements by row selection
> df
   col1 col2
1     a    1
2     a    2
3     a    3
4     a    4
5     a    5
6     a    6
7     a    7
8     a    8
9     a    9
10    a   10
> df[df$col31 == 'a', 'col2'] <- 1
> df
   col1 col2
1     a    1
2     a    2
3     a    3
4     a    4
5     a    5
6     a    6
7     a    7
8     a    8
9     a    9
10    a   10
> df[df$col1 == 'a', 'col2'] <- 1
> df
   col1 col2
1     a    1
2     a    1
3     a    1
4     a    1
5     a    1
6     a    1
7     a    1
8     a    1
9     a    1
10    a    1

Missing data(NA)
# detect anywhere in df
> any(is.na(df))
[1] TRUE
> # anywhere in col
> any(is.na(df$newCol))
[1] FALSE
> # deleting selecting missing row
> df2 <- df[!is.na(df$newCol),]; df2
   col1 col2 newCol col
1     a   NA      0   0
2     a   NA      0   0
3     a   NA      0   0
4     a   NA      0   0
5     a   NA      0   0
6     a   NA      0   0
7     a   NA      0   0
8     a   NA      0   0
9     a   NA      0   0
10    a   NA      0   0
> # replacing NAs with somthing else
> df[is.na(df)] <- 0; df
   col1 col2 newCol col
1     a    0      0   0
2     a    0      0   0
3     a    0      0   0
4     a    0      0   0
5     a    0      0   0
6     a    0      0   0
7     a    0      0   0
8     a    0      0   0
9     a    0      0   0
10    a    0      0   0
> df$col[is.na(df$col2)] <- 0; df
   col1 col2 newCol col
1     a    0      0   0
2     a    0      0   0
3     a    0      0   0
4     a    0      0   0
5     a    0      0   0
6     a    0      0   0
7     a    0      0   0
8     a    0      0   0
9     a    0      0   0
10    a    0      0   0
> df$col2 <- ifelse(is.na(df$col2), 0, df$col); df
   col1 col2 newCol col
1     a    0      0   0
2     a    0      0   0
3     a    0      0   0
4     a    0      0   0
5     a    0      0   0
6     a    0      0   0
7     a    0      0   0
8     a    0      0   0
9     a    0      0   0
10    a    0      0   0
df <- orig[!is.na(orig$series), c('Date, series')]

Traps
1 for loops on possibly empty df's, use: for( in in seq_len(nrow(df))
2 columns coerced to factors, avoid with the argument stringsAsFactor=FALSE
3 confusing row numbers and rows with numbered names(hint: avoid row names)
4 although rbind() accepts vectors and lists; this can fail with factor cols

Wednesday, November 4, 2015

R Basics 4 - Lists

Context: R has two types of vector

Atomic vectors contain values
These values are all of the same type.
They are arranged contiguously.
Atomic vectors cannot contain objects. 
There are six types of atomic vector: raw, logical, integer, numeic, complex and character.

Recursive vectors contain objects
R has two types of recursive vector: : list, expression.

Lists
- At top level: 1-dim indexed object that contains objects (not values)
- Indexed from 1 to length(list)
- Contents can be of different types
- Lists can contain the NULL object
- Deeply nested listed of lists possible
- Can be arbitrarily extended (not fixed)

List creation: usually using list()
> l1 <- list('cat', 5, 1:10, FALSE);l1
[[1]]
[1] "cat"

[[2]]
[1] 5

[[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[4]]
[1] FALSE

> l2 <- list ( x='dog', y=5+2i, z=3:8 );l2
$x
[1] "dog"

$y
[1] 5+2i

$z
[1] 3 4 5 6 7 8

> l3 <- c(l1, l2);l3
[[1]]
[1] "cat"

[[2]]
[1] 5

[[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[4]]
[1] FALSE

$x
[1] "dog"

$y
[1] 5+2i

$z
[1] 3 4 5 6 7 8

> l4 <- list(l1, l2);l4
[[1]]
[[1]][[1]]
[1] "cat"

[[1]][[2]]
[1] 5

[[1]][[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[1]][[4]]
[1] FALSE


[[2]]
[[2]]$x
[1] "dog"

[[2]]$y
[1] 5+2i

[[2]]$z
[1] 3 4 5 6 7 8


> l5 <- as.list( c(1,2,3));l5
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

> origL <- l4
> inserVorL <- l5
> position <- 3
> l6 <- append(origL, inserVorL, position);l6
[[1]]
[[1]][[1]]
[1] "cat"

[[1]][[2]]
[1] 5

[[1]][[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[1]][[4]]
[1] FALSE


[[2]]
[[2]]$x
[1] "dog"

[[2]]$y
[1] 5+2i

[[2]]$z
[1] 3 4 5 6 7 8


[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

Basic information about lists
> dim(l)
NULL
> is.list(l)
[1] TRUE
> is.vector(l)
[1] TRUE
> is.recursive(l)
[1] TRUE
> is.atomic(l)
[1] FALSE
> is.factor(l)
[1] FALSE
> length(l)
[1] 5
> names(l)
NULL
> mode(l)
[1] "list"
> class(l)
[1] "list"
> typeof(l)
[1] "list"
> attributes(l)
NULL

The contents of a list
> print(l)
[[1]]
[[1]][[1]]
[1] "cat"

[[1]][[2]]
[1] 5

[[1]][[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[1]][[4]]
[1] FALSE


[[2]]
[[2]]$x
[1] "dog"

[[2]]$y
[1] 5+2i

[[2]]$z
[1] 3 4 5 6 7 8


[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

> str(l)
List of 5
 $ :List of 4
  ..$ : chr "cat"
  ..$ : num 5
  ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
  ..$ : logi FALSE
 $ :List of 3
  ..$ x: chr "dog"
  ..$ y: cplx 5+2i
  ..$ z: int [1:6] 3 4 5 6 7 8
 $ : num 1
 $ : num 2
 $ : num 3
> dput(l)
list(list("cat", 5, 1:10, FALSE), structure(list(x = "dog", y = 5+2i, 
    z = 3:8), .Names = c("x", "y", "z")), 1, 2, 3)
> head(l)
[[1]]
[[1]][[1]]
[1] "cat"

[[1]][[2]]
[1] 5

[[1]][[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[1]][[4]]
[1] FALSE


[[2]]
[[2]]$x
[1] "dog"

[[2]]$y
[1] 5+2i

[[2]]$z
[1] 3 4 5 6 7 8


[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

> tail(l)
[[1]]
[[1]][[1]]
[1] "cat"

[[1]][[2]]
[1] 5

[[1]][[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[1]][[4]]
[1] FALSE


[[2]]
[[2]]$x
[1] "dog"

[[2]]$y
[1] 5+2i

[[2]]$z
[1] 3 4 5 6 7 8


[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

Trap: cat(x) does not work with lists

Indexing [ versus [[ versus $
- Use [ to get/set multiple items at once
Note: [ always returns a list
- Use [[ and $ to get/set a specific item
- $ only works with named list items
all same: $name $"name" $'name' $`name`
- indexed by positive numbers: these ones
- indexed by negative numbers: not these
- indexed by logical atomic vector: in/out
an empty index l[] returns the list
Tip: When using lists, most of the time you wnat ot index with [[ or $; and avoid [

Indexing examples: one-dimension get
> j <- list(a='cat', b=5, c=FALSE)
> x <- j$a;x
[1] "cat"
> x <- j[['a']];x
[1] "cat"
> x <- j['a'];x
$a
[1] "cat"
> x <- j[[1]];x
[1] "cat"
> x <- j[1];x
$a
[1] "cat"

Indexing examples: set operations
- Start with example data
l <- list(x='a', y='b', z='c', t='d')
- Next use [[ or $ because specific selection
> l[[6]] <- 'new';
> names(l)[5] <- 'w'
> l$w <- 'new-W'
> l[['w']] <- 'dog'
- Change named values: note order ignored
> l[names(l) %in% c('t', 'x')] <- c(1,2)
> l
$x
[1] 1

$y
[1] "b"

$z
[1] "c"

$t
[1] 2

$w
[1] "dog"

[[6]]
[1] "new"

Indexing example: multi-dimension get
- Indexing evaluated from left to right
- Let's start with some example data...
> i <- c('aa', 'bb', 'cc')
> j <- list(a='cat', b=5, c=FALSE)
> k <- list(i, j);k
[[1]]
[1] "aa" "bb" "cc"

[[2]]
[[2]]$a
[1] "cat"

[[2]]$b
[1] 5

[[2]]$c
[1] FALSE


> k[[1]]
[1] "aa" "bb" "cc"
> k[[2]]
$a
[1] "cat"

$b
[1] 5

$c
[1] FALSE

> k[1]
[[1]]
[1] "aa" "bb" "cc"

> k[2]
[[1]]
[[1]]$a
[1] "cat"

[[1]]$b
[1] 5

[[1]]$c
[1] FALSE

> x <- k[[1]][[1]];x
[1] "aa"
> x <- k[[1]][[2]];x
[1] "bb"
> x <- k[1][1][1][1][1];x
[[1]]
[1] "aa" "bb" "cc"

> x <- k[1][2];x
[[1]]
NULL

> x <- k[[2]][1];x
$a
[1] "cat"

List manipulation
1 Arithmetic operators cannot be applied to lists (as content types can vary)
2 Use the apply() functions to apply a function of each element in a list:
> x <- list(a=1, b=month.abb, c=letters)
> lapply(x, FUN=length)
$a
[1] 1

$b
[1] 12

$c
[1] 26

> sapply(x, FUN=length)
 a  b  c 
 1 12 26 
y <- list(a=1, b=3, c=3, c=4)
sapply(y, FUN=function=(x,p) x^p, p=2)
sapply(y, FUN=function=(x,p) x^p, p=2:3)
3 Use unlist to convert list ot vector
> unlist(x)
    a    b1    b2    b3    b4    b5    b6    b7    b8    b9   b10   b11   b12    c1    c2    c3    c4    c5    c6    c7 
  "1" "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"   "a"   "b"   "c"   "d"   "e"   "f"   "g" 
   c8    c9   c10   c11   c12   c13   c14   c15   c16   c17   c18   c19   c20   c21   c22   c23   c24   c25   c26 
  "h"   "i"   "j"   "k"   "l"   "m"   "n"   "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"   "z" 

unlist wont unlist non-atomic 
4 Remove NULL objects from a list
> z <- list(a=1:9, b=letters, c=NULL)
> zNoNull <- Filter(Negate(is.null), z)
> zNoNull
$a
[1] 1 2 3 4 5 6 7 8 9

$b
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

5 Use named lists to return multiple values

6 Factor index treated as integer
decode with v[as.character(f)]

Tuesday, November 3, 2015

R Basics 3 - Atomic Vectors

Atomic vectors:
- An object with contiguous, indexed values
- Indexed from 1 to length(vector)
- All values of the same basic atomic type
- Vectors do not have a dimension attribute
- Has a fixed length once created

Six basic atomic types:
- logical
- integer
- numeric
- complex
- character
- raw

No scalars
In R, the basic types are always in a vecotr. Scalars are just length=1 vectors.

Creation (length determined at creation)
#Default value vectors of length=4
> u <- vector(mode='logical', length=4)
> print(u)
[1] FALSE FALSE FALSE FALSE
> v <- vector(mode= 'integer', length=4)
> print(v)
[1] 0 0 0 0

#Using the sequence operator
> i <- 1:5; i
[1] 1 2 3 4 5
> j <- 1.4:6.4; j
[1] 1.4 2.4 3.4 4.4 5.4 6.4
> k <- seq(from=0, to =1, by=0.1);k
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

#Using the c() function
> l <- c(TRUE,FALSE); l
[1]  TRUE FALSE
> n <- c(1.3, 7, 7/20); n
[1] 1.30 7.00 0.35
> z <- c(1+2i, 2, -3+4i); z
[1]  1+2i  2+0i -3+4i

#Other things
> v1 <- c(a=1,b=2,c=3); v1
a b c
1 2 3
> v2 <- rep(NA, 3); v2
[1] NA NA NA
> v3 <- c(v1, v2); v3
 a  b  c        
 1  2  3 NA NA NA
> v4 <- append(1:5, 2:10, after=5); v4
 [1]  1  2  3  4  5  2  3  4  5  6  7  8  9 10

Conversion
> as.vector(v4)
 [1]  1  2  3  4  5  2  3  4  5  6  7  8  9 10
> as.logical(v4)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> as.integer(v4)
 [1]  1  2  3  4  5  2  3  4  5  6  7  8  9 10
> as.numeric(v4)
 [1]  1  2  3  4  5  2  3  4  5  6  7  8  9 10
> as.character(v4)
 [1] "1"  "2"  "3"  "4"  "5"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> unlist(l)
[1]  TRUE FALSE

Basic information about atomic vectors
> dim(v)
NULL
> is.atomic(v)
[1] TRUE
> is.vector(v)
[1] TRUE
> is.list(v)
[1] FALSE
> is.factor(v)
[1] FALSE
> is.recursive(v)
[1] FALSE
> length(v)
[1] 14
> names(v)
NULL
> mode(v)
[1] "numeric"
> class(v)
[1] "integer"
> typeof(v)
[1] "integer"
> attributes(v)
NULL
> is.numeric(v);
[1] TRUE
> is.character(v);
[1] FALSE
Trap: lists are vectors (but not atomic)
Trap: array/matrix are atomic (not vectors)
Tip: use(is.vector(v) && is.atomic(v)

The content of a vector
> cat(v)
1 2 3 4 5 2 3 4 5 6 7 8 9 10
> print(v)
 [1]  1  2  3  4  5  2  3  4  5  6  7  8  9 10
> str(v)
 int [1:14] 1 2 3 4 5 2 3 4 5 6 ...
> dput(v)
c(1L, 2L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L)
> head(v)
[1] 1 2 3 4 5 2
> tail(v)
[1]  5  6  7  8  9 10

Indexing: [ and [[ but not $
 -[x] selects a vecor for the cell/range x
-[[x]] selects a length=1 vector for the single cell index x (rarely used)
- $ operator invalid for atomic vectors

Index by positive numbers
> v[c(1,2,3)]
[1] 1 2 3
> v[1:2]
[1] 1 2
> v[[7]]
[1] 3
> v[which(v == 'M')]

integer(0)

Index by negative numbers
> v[-1] #get all but the first element
 [1]  2  3  4  5  2  3  4  5  6  7  8  9 10
> v[-length(v)]
 [1] 1 2 3 4 5 2 3 4 5 6 7 8 9
> v[-c(1,3,5,7,9)]
[1]  2  4  2  4  6  7  8  9 10

Index by name(only with named vectors)
> names(v)[1:3] = c('alpha', 'beta','z')
> v[['alpha']]
[1] 1
> v[['beta']]
[1] 2
> v[c('alpha','beta')]
alpha  beta
    1     2
> v[!(names(v) %in% c('c', 'b'))]
alpha  beta     z  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
    1     2     3     4     5     2     3     4     5     6     7     8     9    10

Sorting
> upsorted = sort(v); upsorted;
alpha  beta  <NA>     z  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
    1     2     2     3     3     4     4     5     5     6     7     8     9    10
> v[order(v)]
alpha  beta  <NA>     z  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
    1     2     2     3     3     4     4     5     5     6     7     8     9    10
> d = sort(v, decreasing = TRUE);d
 <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>     z  <NA>  beta  <NA> alpha
   10     9     8     7     6     5     5     4     4     3     3     2     2     1

Raw vectors
> s <- charToRaw('raw');
> r <- as.raw(c(114,97,119))
> print(r)
[1] 72 61 77