Assigning Columns as Factors in Data.Table
I love working with data.table, but there are a few options that always trip me up. Well, I guess there are a lot of things that do. The best way to avoid these pitfalls are to write functions that handle the tasks for you. You can setup a R script that loads when you need the functions. Selecting specific columns in data.table, and converting them to factors or numerical is a good example. I seem to require that operation often enough, but I just seem to stumble with it often - and then my googling expertise comes in. Eventually, you might get tired of spending that time on looking up the same things often.
{{instagram BgJ2SpinC7N }}
Solution
- Create a list of the columns (cols) you need to change - i.e., cols = c(“a”, “b”, “e”)
- Use a dt[,(cols) := lapply(.SD, as.factor), .SDcols = cols] to encode the change
- Write a R script that has functions to perform the task
Let’s Try It Out!
First, let’s create the function and data.table to work with.
did_recode_columns <- function(dt, cols, type = c("as.numeric", "as.factor", "as.character", "as.interger", "as.double") ) {
# function used to convert data.table columns
# to factor, numeric, or character
library(data.table)
dt[,(cols) := lapply(.SD, type), .SDcols = cols]
}
dt <- data.table(a = sample(5), b = sample(5), c = sample(5), d = sample(5), e = sample(5))
pander(sapply(dt, class))
a | b | c | d | e |
---|---|---|---|---|
integer | integer | integer | integer | integer |
Now let’s set the columns and use the function to change them.
cols = c("a", "b", "e")
did_recode_columns(dt, cols, type = "as.factor")
pander(sapply(dt, class))
a | b | c | d | e |
---|---|---|---|---|
factor | factor | integer | integer | factor |
Until next time…