通過位元組編譯加速對迴圈進行難以向量化

Created: November-22, 2018

按照本文件條目中的 Rcpp 示例，考慮以下難以向量化的函式，該函式建立一個長度為 len 的向量，其中指定第一個元素（first）並且每個元素 x_i 等於 cos(x_{i-1} + 1)：

repeatedCosPlusOne <- function(first, len) {
  x <- numeric(len)
  x[1] <- first
  for (i in 2:len) {
    x[i] <- cos(x[i-1] + 1)
  }
  return(x)
}

在不重寫單行程式碼的情況下加速這種函式的一種簡單方法是使用 R 編譯包對位元組進行位元組編譯：

library(compiler)
repeatedCosPlusOneCompiled <- cmpfun(repeatedCosPlusOne)

結果函式通常會明顯更快，同時仍返回相同的結果：

all.equal(repeatedCosPlusOne(1, 1e6), repeatedCosPlusOneCompiled(1, 1e6))
# [1] TRUE
system.time(repeatedCosPlusOne(1, 1e6))
#    user  system elapsed 
#   1.175   0.014   1.201 
system.time(repeatedCosPlusOneCompiled(1, 1e6))
#    user  system elapsed 
#   0.339   0.002   0.341

在這種情況下，位元組編譯加速了長度為 1 百萬的向量上的難以向量化的操作，從 1.20 秒到 0.34 秒。

備註

repeatedCosPlusOne 的本質，作為單個函式的累積應用，可以用 Reduce 更透明地表達：

iterFunc <- function(init, n, func) {
  funcs <- replicate(n, func)
  Reduce(function(., f) f(.), funcs, init = init, accumulate = TRUE)
}
repeatedCosPlusOne_vec <- function(first, len) {
  iterFunc(first, len - 1, function(.) cos(. + 1))
}

repeatedCosPlusOne_vec 可以被視為 repeatedCosPlusOne 的向量化。但是，可以預期它會慢 2 倍：

library(microbenchmark)
microbenchmark(
  repeatedCosPlusOne(1, 1e4),
  repeatedCosPlusOne_vec(1, 1e4)
)
#> Unit: milliseconds
#>                              expr       min        lq     mean   median       uq      max neval cld
#>      repeatedCosPlusOne(1, 10000)  8.349261  9.216724 10.22715 10.23095 11.10817 14.33763   100  a 
#>  repeatedCosPlusOne_vec(1, 10000) 14.406291 16.236153 17.55571 17.22295 18.59085 24.37059   100   b