助手功能

輔助函式與 select 一起使用以識別要返回的變數。除非另有說明,否則這些函式需要一個字串作為第一個引數 match。傳遞向量或其他物件將產生錯誤。

library(dplyr)
library(nycflights13)

以。。開始

starts_with 允許我們識別名稱以字串開頭的變數。

返回以字母 e 開頭的所有變數。

planes %>% select(starts_with("e"))
## # A tibble: 3,322 × 2
##    engines    engine
##      <int>     <chr>
## 1        2 Turbo-fan
## 2        2 Turbo-fan
## 3        2 Turbo-fan
## 4        2 Turbo-fan
## 5        2 Turbo-fan
## 6        2 Turbo-fan
## 7        2 Turbo-fan
## 8        2 Turbo-fan
## 9        2 Turbo-fan
## 10       2 Turbo-fan
## # ... with 3,312 more rows

對於嚴格套管,將 ignore.case 引數設定為 FALSE。

planes %>% select(starts_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0

以。。結束

返回以字母 e 結尾的所有變數。

planes %>% select(ends_with("e"))
## # A tibble: 3,322 × 2
##                       type    engine
##                      <chr>     <chr>
## 1  Fixed wing multi engine Turbo-fan
## 2  Fixed wing multi engine Turbo-fan
## 3  Fixed wing multi engine Turbo-fan
## 4  Fixed wing multi engine Turbo-fan
## 5  Fixed wing multi engine Turbo-fan
## 6  Fixed wing multi engine Turbo-fan
## 7  Fixed wing multi engine Turbo-fan
## 8  Fixed wing multi engine Turbo-fan
## 9  Fixed wing multi engine Turbo-fan
## 10 Fixed wing multi engine Turbo-fan
## # ... with 3,312 more rows

對於嚴格的套管,將 ignore.case 引數設定為 FALSE。

planes %>% select(ends_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0

包含

contains 允許你查詢包含給定字串的任何變數。

planes %>% select(contains("ea"))
## # A tibble: 3,322 × 2
##     year seats
##    <int> <int>
## 1   2004    55
## 2   1998   182
## 3   1999   182
## 4   1999   182
## 5   2002    55
## 6   1999   182
## 7   1999   182
## 8   1999   182
## 9   1999   182
## 10  1999   182
## # ... with 3,312 more rows

對於嚴格套管,將 ignore.case 引數設定為 FALSE。

planes %>% select(contains("EA", ignore.case = FALSE))
## # A tibble: 3,322 × 0

匹配

matches 是唯一允許使用正規表示式的輔助函式。

返回名稱至少為六個字母字元的所有變數:

planes %>% select(matches("[[:alpha:]]{6,}"))
## # A tibble: 3,322 × 4
##    tailnum     manufacturer engines    engine
##      <chr>            <chr>   <int>     <chr>
## 1   N10156          EMBRAER       2 Turbo-fan
## 2   N102UW AIRBUS INDUSTRIE       2 Turbo-fan
## 3   N103US AIRBUS INDUSTRIE       2 Turbo-fan
## 4   N104UW AIRBUS INDUSTRIE       2 Turbo-fan
## 5   N10575          EMBRAER       2 Turbo-fan
## 6   N105UW AIRBUS INDUSTRIE       2 Turbo-fan
## 7   N107US AIRBUS INDUSTRIE       2 Turbo-fan
## 8   N108UW AIRBUS INDUSTRIE       2 Turbo-fan
## 9   N109UW AIRBUS INDUSTRIE       2 Turbo-fan
## 10  N110UW AIRBUS INDUSTRIE       2 Turbo-fan
## # ... with 3,312 more rows

對於嚴格套管,將 ignore.case 引數設定為 FALSE。

num_range

對於此示例,我將生成具有隨機值和順序變數名稱的虛擬資料幀。

set.seed(1)
df <- data.frame(x1 = runif(10), 
                 x2 = runif(10), 
                 x3 = runif(10), 
                 x4 = runif(10), 
                 x5 = runif(10))

num_range 可用於選擇一系列的變數,給定一致的 prefix

df 中選擇變數 2:4:

df %>% select(num_range('x', range = 2:4))
##           x2         x3        x4
## 1  0.2059746 0.93470523 0.4820801
## 2  0.1765568 0.21214252 0.5995658
## 3  0.6870228 0.65167377 0.4935413
## 4  0.3841037 0.12555510 0.1862176
## 5  0.7698414 0.26722067 0.8273733
## 6  0.4976992 0.38611409 0.6684667
## 7  0.7176185 0.01339033 0.7942399
## 8  0.9919061 0.38238796 0.1079436
## 9  0.3800352 0.86969085 0.7237109
## 10 0.7774452 0.34034900 0.4112744

one_of

one_of 可以將向量作為 match 引數並返回每個變數。

planes %>% select(one_of(c("tailnum", "model")))
## # A tibble: 3,322 × 2
##    tailnum     model
##      <chr>     <chr>
## 1   N10156 EMB-145XR
## 2   N102UW  A320-214
## 3   N103US  A320-214
## 4   N104UW  A320-214
## 5   N10575 EMB-145LR
## 6   N105UW  A320-214
## 7   N107US  A320-214
## 8   N108UW  A320-214
## 9   N109UW  A320-214
## 10  N110UW  A320-214
## # ... with 3,312 more rows

一切

everything 可用於重新定位資料框中的變數。

manufacturer 設為第一個變數,然後是所有剩餘變數。

planes %>% select(manufacturer, everything())
## # A tibble: 3,322 × 9
##        manufacturer tailnum  year                    type     model
##               <chr>   <chr> <int>                   <chr>     <chr>
## 1           EMBRAER  N10156  2004 Fixed wing multi engine EMB-145XR
## 2  AIRBUS INDUSTRIE  N102UW  1998 Fixed wing multi engine  A320-214
## 3  AIRBUS INDUSTRIE  N103US  1999 Fixed wing multi engine  A320-214
## 4  AIRBUS INDUSTRIE  N104UW  1999 Fixed wing multi engine  A320-214
## 5           EMBRAER  N10575  2002 Fixed wing multi engine EMB-145LR
## 6  AIRBUS INDUSTRIE  N105UW  1999 Fixed wing multi engine  A320-214
## 7  AIRBUS INDUSTRIE  N107US  1999 Fixed wing multi engine  A320-214
## 8  AIRBUS INDUSTRIE  N108UW  1999 Fixed wing multi engine  A320-214
## 9  AIRBUS INDUSTRIE  N109UW  1999 Fixed wing multi engine  A320-214
## 10 AIRBUS INDUSTRIE  N110UW  1999 Fixed wing multi engine  A320-214
## # ... with 3,312 more rows, and 4 more variables: engines <int>,
## #   seats <int>, speed <int>, engine <chr>

其他助手

雖然:- 運算子不屬於 dplyr 包,但我們仍然可以使用它們來識別要返回的變數。

定義要返回的包含範圍的變數。

將每個變數從 year 返回到 manufacturer

planes %>% select(year:manufacturer)
## # A tibble: 3,322 × 3
##     year                    type     manufacturer
##    <int>                   <chr>            <chr>
## 1   2004 Fixed wing multi engine          EMBRAER
## 2   1998 Fixed wing multi engine AIRBUS INDUSTRIE
## 3   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 4   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 5   2002 Fixed wing multi engine          EMBRAER
## 6   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 7   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 8   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 9   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 10  1999 Fixed wing multi engine AIRBUS INDUSTRIE
## # ... with 3,312 more rows

返回多個變數範圍:

planes %>% select(c(year:manufacturer, seats:engine))
## # A tibble: 3,322 × 6
##     year                    type     manufacturer seats speed    engine
##    <int>                   <chr>            <chr> <int> <int>     <chr>
## 1   2004 Fixed wing multi engine          EMBRAER    55    NA Turbo-fan
## 2   1998 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 3   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 4   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 5   2002 Fixed wing multi engine          EMBRAER    55    NA Turbo-fan
## 6   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 7   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 8   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 9   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 10  1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## # ... with 3,312 more rows

-

- 運算子將從結果集中刪除變數。

返回除 type 之外的所有變數:

planes %>% select(-type)
## # A tibble: 3,322 × 8
##    tailnum  year     manufacturer     model engines seats speed    engine
##      <chr> <int>            <chr>     <chr>   <int> <int> <int>     <chr>
## 1   N10156  2004          EMBRAER EMB-145XR       2    55    NA Turbo-fan
## 2   N102UW  1998 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 3   N103US  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 4   N104UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 5   N10575  2002          EMBRAER EMB-145LR       2    55    NA Turbo-fan
## 6   N105UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 7   N107US  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 8   N108UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 9   N109UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 10  N110UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## # ... with 3,312 more rows

你還可以傳遞變數名稱向量以從結果集中排除。

planes %>% select(-c(type, engines:engine))
## # A tibble: 3,322 × 4
##    tailnum  year     manufacturer     model
##      <chr> <int>            <chr>     <chr>
## 1   N10156  2004          EMBRAER EMB-145XR
## 2   N102UW  1998 AIRBUS INDUSTRIE  A320-214
## 3   N103US  1999 AIRBUS INDUSTRIE  A320-214
## 4   N104UW  1999 AIRBUS INDUSTRIE  A320-214
## 5   N10575  2002          EMBRAER EMB-145LR
## 6   N105UW  1999 AIRBUS INDUSTRIE  A320-214
## 7   N107US  1999 AIRBUS INDUSTRIE  A320-214
## 8   N108UW  1999 AIRBUS INDUSTRIE  A320-214
## 9   N109UW  1999 AIRBUS INDUSTRIE  A320-214
## 10  N110UW  1999 AIRBUS INDUSTRIE  A320-214
## # ... with 3,312 more rows

輔助函式的任意組合

選擇 typespeed(包括)之間的所有變數並排除 manufacturer

planes %>% select(type:speed, -manufacturer)
## # A tibble: 3,322 × 5
##                       type     model engines seats speed
##                      <chr>     <chr>   <int> <int> <int>
## 1  Fixed wing multi engine EMB-145XR       2    55    NA
## 2  Fixed wing multi engine  A320-214       2   182    NA
## 3  Fixed wing multi engine  A320-214       2   182    NA
## 4  Fixed wing multi engine  A320-214       2   182    NA
## 5  Fixed wing multi engine EMB-145LR       2    55    NA
## 6  Fixed wing multi engine  A320-214       2   182    NA
## 7  Fixed wing multi engine  A320-214       2   182    NA
## 8  Fixed wing multi engine  A320-214       2   182    NA
## 9  Fixed wing multi engine  A320-214       2   182    NA
## 10 Fixed wing multi engine  A320-214       2   182    NA
## # ... with 3,312 more rows

修改前一個語句以排除 manufacturermodel

planes %>% select(type:speed, -c(manufacturer, model))
## # A tibble: 3,322 × 4
##                       type engines seats speed
##                      <chr>   <int> <int> <int>
## 1  Fixed wing multi engine       2    55    NA
## 2  Fixed wing multi engine       2   182    NA
## 3  Fixed wing multi engine       2   182    NA
## 4  Fixed wing multi engine       2   182    NA
## 5  Fixed wing multi engine       2    55    NA
## 6  Fixed wing multi engine       2   182    NA
## 7  Fixed wing multi engine       2   182    NA
## 8  Fixed wing multi engine       2   182    NA
## 9  Fixed wing multi engine       2   182    NA
## 10 Fixed wing multi engine       2   182    NA
## # ... with 3,312 more rows

你可以多次使用相同的輔助函式。

planes %>% select(starts_with("m"), starts_with("s"))
## # A tibble: 3,322 × 4
##        manufacturer     model seats speed
##               <chr>     <chr> <int> <int>
## 1           EMBRAER EMB-145XR    55    NA
## 2  AIRBUS INDUSTRIE  A320-214   182    NA
## 3  AIRBUS INDUSTRIE  A320-214   182    NA
## 4  AIRBUS INDUSTRIE  A320-214   182    NA
## 5           EMBRAER EMB-145LR    55    NA
## 6  AIRBUS INDUSTRIE  A320-214   182    NA
## 7  AIRBUS INDUSTRIE  A320-214   182    NA
## 8  AIRBUS INDUSTRIE  A320-214   182    NA
## 9  AIRBUS INDUSTRIE  A320-214   182    NA
## 10 AIRBUS INDUSTRIE  A320-214   182    NA
## # ... with 3,312 more rows

你可以一起使用多個輔助函式:

planes %>% select(starts_with("m"), ends_with("l"))
## # A tibble: 3,322 × 2
##        manufacturer     model
##               <chr>     <chr>
## 1           EMBRAER EMB-145XR
## 2  AIRBUS INDUSTRIE  A320-214
## 3  AIRBUS INDUSTRIE  A320-214
## 4  AIRBUS INDUSTRIE  A320-214
## 5           EMBRAER EMB-145LR
## 6  AIRBUS INDUSTRIE  A320-214
## 7  AIRBUS INDUSTRIE  A320-214
## 8  AIRBUS INDUSTRIE  A320-214
## 9  AIRBUS INDUSTRIE  A320-214
## 10 AIRBUS INDUSTRIE  A320-214
## # ... with 3,312 more rows