刪除密切相關的功能

密切相關的特徵可能會增加模型的差異,刪除相關對中的一個可能有助於減少這種差異。有很多方法可以檢測相關性。這是一個:

library(purrr) # in order to use keep()

# select correlatable vars
toCorrelate<-mtcars %>% keep(is.numeric)

# calculate correlation matrix
correlationMatrix <- cor(toCorrelate)

# pick only one out of each highly correlated pair's mirror image
correlationMatrix[upper.tri(correlationMatrix)]<-0  

# and I don't remove the highly-correlated-with-itself group
diag(correlationMatrix)<-0 

# find features that are highly correlated with another feature at the +- 0.85 level
apply(correlationMatrix,2, function(x) any(abs(x)>=0.85))

  mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb 
 TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 

我想看看 MPG 與之相關的是什麼,並決定要保留什麼和折騰什麼。對於 cyl 和 disp 也是如此。或者,我可能需要結合一些強相關的功能。