r - converting a row of data.frame to column names using data.table -
i have large data frame of 5 million rows, 3 columns. transform matrix has rows user_id, id columns, , value cnt. done melt
, cast
or
xtabs(cnt ~ user_id + id, data = foo)
however object created large , following error 'dim' specifies large array
user_id id cnt 1 1.813e+14 21 1 2 1.559e+14 28 1 6 1.592e+14 71 2
i'm trying use data.table seams handle large data better data.frame, can't figure out how use data.table create contingency table want.
1 have idea how working? i'm thinking of creating , empty matrix appropriate dimensions , fill appropriate indexes.
try using built in data.frame co2
:
> xtabs(uptake ~ treatment + type, co2) type treatment quebec mississippi nonchilled 742.0 545.0 chilled 666.8 332.1
or using tapply
:
> with(co2, tapply(uptake, list(treatment, type), sum)) quebec mississippi nonchilled 742.0 545.0 chilled 666.8 332.1
and compare data.table:
> library(data.table) > > dt <- data.table(co2) > dt[, as.list(tapply(uptake, type, sum)), = treatment] treatment quebec mississippi 1: nonchilled 742.0 545.0 2: chilled 666.8 332.1
cautionary note: if same levels of type
not appear in every treatment
group not sufficient. in case necessary convert type
factor in data table (as in co2
).
added:
its possible rid of tapply
, have pure data table approach this:
> dt[, setnames(as.list(.sd[,list(uptake = sum(uptake)), = type][, uptake]), + levels(type)), = treatment] treatment quebec mississippi 1: nonchilled 742.0 545.0 2: chilled 666.8 332.1
the cautionary note above applies here too.
Comments
Post a Comment