新增和修改列

Created: November-22, 2018

DT[where, select|update|do, by] 語法用於處理 data.table 的列。

where 部分是 i 引數
“select | update | do”部分是 j 引數

這兩個引數通常按位置而不是按名稱傳遞。

我們的示例資料如下

mtcars = data.table(mtcars, keep.rownames = TRUE)

編輯整個列

使用 j 中的:= 運算子來分配新列：

mtcars[, mpg_sq := mpg^2]

通過設定為 NULL 刪除列：

mtcars[, mpg_sq := NULL]

使用:= 運算子的多變數格式新增多個列：

mtcars[, `:=`(mpg_sq = mpg^2, wt_sqrt = sqrt(wt))]
# or 
mtcars[, c("mpg_sq", "wt_sqrt") := .(mpg^2, sqrt(wt))]

如果列是依賴的並且必須按順序定義，則一種方法是：

mtcars[, c("mpg_sq", "mpg2_hp") := .(temp1 <- mpg^2, temp1/hp)]

當 LHS := RHS 的右側是列列表時，使用 .() 語法。

對於動態確定的列名稱，請使用括號：

vn = "mpg_sq"
mtcars[, (vn) := mpg^2]

也可以使用 set 修改列，但這很少需要：

set(mtcars, j = "hp_over_wt", v = mtcars$hp/mtcars$wt)

編輯列的子集

使用 i 引數子集到行 where 應該進行編輯：

mtcars[1:3, newvar := "Hello"]
# or
set(mtcars, j = "newvar", i = 1:3, v = "Hello")

與 data.frame 一樣，我們可以使用行號或邏輯測試進行子集化。也可以在 i 中使用 join，但另一個例子中包含更復雜的任務。

編輯列屬性

編輯屬性的函式（例如 levels<- 或 names<-）實際上用修改後的副本替換物件。即使只在 data.table 中的一列上使用，也會複製和替換整個物件。

要修改沒有副本的物件，請使用 setnames 更改 data.table 或 data.frame 和 setattr 的列名以更改任何物件的屬性。

# Print a message to the console whenever the data.table is copied
tracemem(mtcars)
mtcars[, cyl2 := factor(cyl)]

# Neither of these statements copy the data.table
setnames(mtcars, old = "cyl2", new = "cyl_fac")
setattr(mtcars$cyl_fac, "levels", c("four", "six", "eight"))

# Each of these statements copies the data.table
names(mtcars)[names(mtcars) == "cyl_fac"] <- "cf"
levels(mtcars$cf) <- c("IV", "VI", "VIII")

請注意，這些更改是通過引用進行的，因此它們是全域性的。在一個環境中更改它們會影響所有環境中的物件。

# This function also changes the levels in the global environment
edit_levels <- function(x) setattr(x, "levels", c("low", "med", "high"))
edit_levels(mtcars$cyl_factor)