R Basket Analysis using arules package with unique order number but duplicate order combinations -
r basket analysis using arules package unique order number duplicate order combinations
just learning r. i'm trying basket analysis using arules package (but i'm totally open other package suggestions!) compare possible combinations of 6 different item types being purchased.
my original data set looked this:
orderno, itemtype, itemcount 111, health, 1 111, leisure, 2 111, sports, 1 222, health, 3 333, food, 7 333, clothing, 1 444, clothing, 2 444, health, 1 444, accessories, 2
. . .
the list goes on , has 3,000 observations.
i collapsed data matrix contains 1 row each unique order containing counts of specific itemtype:
orderno, accessories, clothing, food, health, leisure, sports 111, 0, 0, 0, 1, 2, 1 222, 0, 0, 0, 3, 0, 0 333, 0, 1, 7, 0 , 0, 0 444, 2, 2, 0, 1, 0, 0 . . .
every time try read in transactions using following command (and million attempted variations of it):
tr <- read.transactions("dataset.csv", rm.duplicates=false, format="basket", sep=",")
i error message: error in asmethod(object): can not coerce list transactions duplicated items.
i'm assuming because have 3,000 observations , inevitably combinations going show more once (i.e., more 1 person purchasing 1 piece of clothing , nothing else: orderno, 0, 1, 0, 0, 0, 0). know collapse data set on counts of unique combinations, i'm worried if that, there no weights show frequent combinations.
i thought using format="basket" account different orders containing same item combinations, apparently that's not case. i'm lost. documentation i've read implies possible can't find examples or advice on how approach problem.
any advice appreciated! head spinning on one.
extra info: end result, i'm looking top 5 significant combinations of purchase combinations. don't know if helps.
you must remove duplicates, if using .csv file, please run data -> remove duplicate in excel before processing file. arules throws error if duplicate found , because of getting error.
another way use duplicated() on itemset , remove duplicate using unique().
or more simple approach found in post
association analysis duplicate transactions using arules package in r
Comments
Post a Comment