從資料框中子集行和列

Created: November-22, 2018

訪問行和列的語法：`[`，`[[` 和 `$`

本主題介紹了訪問資料框特定行和列的最常用語法。這些是

就像一個帶有單支架 data[rows, columns] 的 matrix
- 使用行號和列號
- 使用列（和行）名稱
像 list：
- 使用單個括號 data[columns] 來獲取資料幀
- 用雙括號 data[[one_column]] 得到一個向量
用 $ 為單列 data$column_name

我們將使用內建的 mtcars 資料框來說明。

像矩陣：`data[rows, columns]`

使用數字索引

使用內建資料框 mtcars，我們可以使用帶有逗號的 [] 括號提取行和列。逗號前的索引是行：

# get the first row
mtcars[1, ]
# get the first five rows
mtcars[1:5, ]

同樣，逗號後面是列：

# get the first column
mtcars[, 1]
# get the first, third and fifth columns:
mtcars[, c(1, 3, 5)]

如上所示，如果將行或列留空，則將選擇所有行或列。mtcars[1, ] 表示包含所有列的第一行。

使用列（和行）名稱

到目前為止，這與訪問矩陣的行和列的方式相同。對於 data.frames，大多數情況下最好使用列名稱作為列索引。這是通過使用帶有列名的 character 而不是帶有列號的 numeric 來完成的：

# get the mpg column
mtcars[, "mpg"]
# get the mpg, cyl, and disp columns
mtcars[, c("mpg", "cyl", "disp")]

雖然不太常見，但也可以使用行名稱：

mtcars["Mazda Rx4", ]

行和列在一起

行和列引數可以一起使用：

# first four rows of the mpg column
mtcars[1:4, "mpg"]

# 2nd and 5th row of the mpg, cyl, and disp columns
mtcars[c(2, 5), c("mpg", "cyl", "disp")]

有關尺寸的警告：

使用這些方法時，如果提取多個列，則會返回一個資料幀。但是，如果你提取單個列，你將獲得一個向量，而不是預設選項下的資料框。

## multiple columns returns a data frame
class(mtcars[, c("mpg", "cyl")])
# [1] "data.frame"
## single column returns a vector
class(mtcars[, "mpg"])
# [1] "numeric"

有兩種方法可以解決這個問題。一種是將資料框視為列表（見下文），另一種是新增 drop = FALSE 引數。這告訴 R 不要刪除未使用的尺寸：

class(mtcars[, "mpg", drop = FALSE])
# [1] "data.frame"

請注意，矩陣的工作方式相同 - 預設情況下，單個列或行將是向量，但如果指定 drop = FALSE，則可以將其保留為單列或單行矩陣。

就像一個清單

資料幀本質上是 lists，即它們是列向量列表（所有列必須具有相同的長度）。列表可以是使用單個括號 [ 作為子列表的子集，或者對於單個元素使用雙括號 [[。

單支架 `data[columns]`

當你使用單括號而不使用逗號時，你將返回列，因為資料框是列的列表。

mtcars["mpg"]
mtcars[c("mpg", "cyl", "disp")]
my_columns <- c("mpg", "cyl", "hp")
mtcars[my_columns]

單個括號，如列表與單個括號，如矩陣

data[columns] 和 data[, columns] 之間的區別在於，當將 data.frame 視為 list（括號中沒有逗號）時，返回的物件將是一個 data.frame 。如果你使用逗號將 data.frame 視為 matrix，則選擇單個列將返回一個向量，但選擇多個列將返回 data.frame。

## When selecting a single column
## like a list will return a data frame
class(mtcars["mpg"])
# [1] "data.frame"
## like a matrix will return a vector
class(mtcars[, "mpg"])
# [1] "numeric"

帶雙支架 `data[[one_column]]`

要將 data.frame 作為 list 處理時將單個列提取為向量，可以使用雙括號 [[。這僅適用於一次一列。

# extract a single column by name as a vector 
mtcars[["mpg"]]

# extract a single column by name as a data frame (as above)
mtcars["mpg"]

使用 `$` 訪問列

可以使用神奇的快捷方式 $ 提取單個列，而不使用帶引號的列名：

# get the column "mpg"
mtcars$mpg

$ 訪問的列將始終是向量，而不是資料幀。

用於訪問列的 `$` 的缺點

$ 可以是一個方便的快捷方式，特別是如果你在一個環境（如 RStudio）中工作，在這種情況下將自動完成列名稱。但是， $ 也有缺點：它使用非標準評估來避免引號的需要，這意味著如果你的列名儲存在變數中它將無法工作。

my_column <- "mpg"
# the below will not work
mtcars$my_column
# but these will work
mtcars[, my_column]  # vector
mtcars[my_column]    # one-column data frame
mtcars[[my_column]]  # vector

由於這些問題，當列名不變時，$ 最適合用於互動式 R 會話。對於程式化使用，例如在編寫將在具有不同列名的不同資料集上使用的通用函式時，應避免使用 $。

另請注意，預設行為是僅在通過 $ 從遞迴物件（環境除外）中提取時使用部分匹配

# give you the values of "mpg" column 
# as "mtcars" has only one column having name starting with "m"
mtcars$m 
# will give you "NULL" 
# as "mtcars" has more than one columns having name starting with "d"
mtcars$d

高階索引：負索引和邏輯索引

每當我們可以選擇使用數字作為索引時，我們也可以使用負數來省略某些索引或布林（邏輯）向量來準確指出要保留的專案。

負指數省略了元素

mtcars[1, ]   # first row
mtcars[ -1, ] # everything but the first row
mtcars[-(1:10), ] # everything except the first 10 rows

邏輯向量表示要保留的特定元素

我們可以使用 < 等條件生成邏輯向量，並僅提取滿足條件的行：

# logical vector indicating TRUE when a row has mpg less than 15
# FALSE when a row has mpg >= 15
test <- mtcars$mpg < 15 

# extract these rows from the data frame 
mtcars[test, ]

我們也可以繞過儲存中間變數的步驟

# extract all columns for rows where the value of cyl is 4.
mtcars[mtcars$cyl == 4, ]
# extract the cyl, mpg, and hp columns where the value of cyl is 4
mtcars[mtcars$cyl == 4, c("cyl", "mpg", "hp")]

訪問行和列的語法：[，[[ 和 $

像矩陣：data[rows, columns]