Nothing! R's (for-while-repeat) loops are intuitive, and easy to code an maintain. Some tasks are best managed within loops.
So why discourage the use of for-loops?
1) Side effects and detritus from inline code. Replacing a loop with a function call means that what happened in the function stayed in the function.
2) In some cases increased speed (especially so with nested loops and from poor loop-coding practice).
How to make the paradigm shift?
1) Use R's vectorization features.
2) See if object indexing and subset assignment can replace the for-loop.
3) If not, find an "apply" function that slices your object the way you need.
4) Find (or write ) a function to do what you would have done in the body of the for-loop. Anonymous functions can be very useful for this task.
5) if all else fails: move as much code as possible outside of the loop body
Play data (for the examples following)
requires('zoo')
require('plyr')
n <- 100
u <- 1
v <- rnow(n, 10, 10) + 1:n
w <- round(runif(n, 0.6, 9.4))
df <- data.frame(month=u, x=u, y=v, z=w)
l <- list(x = u, y = v, z = w, yz = v*w, xyz = u*v*w)
trivial.add <= function(a, b) { a+b }
Use R's vectorization features
tot <- sum(log(u))
tot <- 0
for(i in seq_along(u)){
tot <- tot + log(u([i]))
}
Clever indexing and subset assignment
df[df$z == 5, 'y'] <- -1
The base apply family of functions
# l stands for list
# s stands for array
# d stands for data.frame
# t stands for array
# m is a special input type, which means that we provide multiple arguments in atabular format for the function
# r input type expects an integer, which specifies the number of times replicated
#_ is a special output type that does not return anything for the function
# l stands for list
# s stands for array
# d stands for data.frame
# t stands for array
# m is a special input type, which means that we provide multiple arguments in atabular format for the function
# r input type expects an integer, which specifies the number of times replicated
#_ is a special output type that does not return anything for the function
apply(x, margin, fun, ...)
lapply(x, fun, ...)
sapply(x, fun, ...)
vapply(x, fun, fun.value, ...)
tapply(x, index, fun = NULL, ...)
mapply(fun, .., moreargs = NULL)
eapply(env, fun, ...)
replicate(n, expr, simplify = "array")
by(data, indices, fun, ...)
aggregated(x, by, fun, ...)
rapply()
apply(by row/column on two+ dim object)
# Object: m, t,df, a (has 2+ dimensions)
# Returns: v, l, m (depends on input & fn)
column.mean <- apply(df, 2, mean)
row.product <- apply(df, 1, prod)
lapply (on vecotr or list, return list)
lapply(l, mean)
unlist(lapply(u, trivial.add, 5))
sapply ( a simplified lapply on v or l)
# object: v, l;
# Returns: usually a vector
sapply(l, mean)
sapply(u, function(a) a*a)
sapply(u, trivial.add, -1)
Using sapply and lapply work in a similar way, traversing over a set of data like a list or vector, and calling the specified function for each item.
Sometimes we require traversal of our data in a less than linear way. Say we wanted to compare the current observation with the value 5 periods before it. Use can probably use rollapply for this (via quantmod), but a quick and dirty way is to run sapply or lapply passing a set of index values.
Here we will use sapply, which works on a list or vector of data.
sapply(1:3, function(x) x^2)
#[1] 1 4 9
lapply is very similar, however it will return a list rather than a vector:
lapply(1:3, function(x) x^2)
#[[1]]
#[1] 1
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 9
Passing simplify=FALSE to sapply will also give you a list:
sapply(1:3, function(x) x^2, simplify=F)
#[[1]]
#[1] 1
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 9
And you can use unlist with lapply to get a vector.
unlist(lapply(1:3, function(x) x^2))
#[1] 1 4 9
Sometimes we require traversal of our data in a less than linear way. Say we wanted to compare the current observation with the value 5 periods before it. Use can probably use rollapply for this (via quantmod), but a quick and dirty way is to run sapply or lapply passing a set of index values.
Here we will use sapply, which works on a list or vector of data.
sapply(1:3, function(x) x^2)
#[1] 1 4 9
lapply is very similar, however it will return a list rather than a vector:
lapply(1:3, function(x) x^2)
#[[1]]
#[1] 1
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 9
Passing simplify=FALSE to sapply will also give you a list:
sapply(1:3, function(x) x^2, simplify=F)
#[[1]]
#[1] 1
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 9
And you can use unlist with lapply to get a vector.
unlist(lapply(1:3, function(x) x^2))
#[1] 1 4 9
tapply ( group v/l by factor & apply fn)
count.table <- tapply(v, w, length)
min.1 <- with(df, tapply(y, z, min))
by (on l or v, returns "by" objects)
min.2 <- by(df$y, df$z, min)
min.3 <- by(df[, c('x', 'y'), df$z, min)
# last one: finds min from two columns
aggregte
ag <- aggregate(df, by=list(df$z), mean)
aggregate(df, by=list(w, 1+(u%%12)), mean)
# Trap: variables must be in a list
rollapply - from the zoo package
# A 5-term, centred, rolling average
v.maz <- rollapply(v, 5, mean, fill = NA)
# Sum 3 months data for a quarterly total
v.qtrly <- rollapply(v, 3, sum, fill=NA, align='right')
# Note: zoo has rollmean(), rollmax() and rollmedian() functions
inside a data.frame
# Use transform() or within() to apply a function to a column in a data.frame
df <- within(df, v.qtryly <- rollapply(v, 3,sum, fill=NA, align='right'))
# use with() to simplify column access
The plyr package
Plyr is a fantastic family of apply like functions with a common naming system for the input-to and output-from split-apply-combine procedures. I use ddply() the most.
ddply(df, .(x), summaise, min=min(y), max=max(y))
ddply(df, .(x), transform, span = x- y)
Other packages worth looking at
3
# foreach - a set of apply-like fns
# snow - parallelised apply-like fns
# snowfall - a usability wrapper for snow
Abbreviation
v=vector
l=list
m=matrix
df=data.frame
a=array
t=table
f=factor
d=dates
No comments:
Post a Comment