r - converting a row of data.frame to column names using data.table -
i have large data frame of 5 million rows, 3 columns. transform matrix has rows user_id, id columns, , value cnt. done melt , cast or
xtabs(cnt ~ user_id + id, data = foo) however object created large , following error 'dim' specifies large array
user_id id cnt 1 1.813e+14 21 1 2 1.559e+14 28 1 6 1.592e+14 71 2 i'm trying use data.table seams handle large data better data.frame, can't figure out how use data.table create contingency table want.
1 have idea how working? i'm thinking of creating , empty matrix appropriate dimensions , fill appropriate indexes.
try using built in data.frame co2 :
> xtabs(uptake ~ treatment + type, co2) type treatment quebec mississippi nonchilled 742.0 545.0 chilled 666.8 332.1 or using tapply:
> with(co2, tapply(uptake, list(treatment, type), sum)) quebec mississippi nonchilled 742.0 545.0 chilled 666.8 332.1 and compare data.table:
> library(data.table) > > dt <- data.table(co2) > dt[, as.list(tapply(uptake, type, sum)), = treatment] treatment quebec mississippi 1: nonchilled 742.0 545.0 2: chilled 666.8 332.1 cautionary note: if same levels of type not appear in every treatment group not sufficient. in case necessary convert type factor in data table (as in co2).
added:
its possible rid of tapply , have pure data table approach this:
> dt[, setnames(as.list(.sd[,list(uptake = sum(uptake)), = type][, uptake]), + levels(type)), = treatment] treatment quebec mississippi 1: nonchilled 742.0 545.0 2: chilled 666.8 332.1 the cautionary note above applies here too.
Comments
Post a Comment