algorithm - Random exhaustive (non-repeating) selection from a large pool of entries -
suppose have large (300-500k) collection of text documents stored in relational database. each document can belong 1 or more (up six) categories. need users able randomly select documents in specific category single entity never repeated, how stumbleupon works.
i don't see way implement using slow not in queries large amount of users , documents, figured might need implement custom data structure purpose. perhaps there paper describing algorithm might adapted needs?
currently i'm considering following approach:
read entries database create linked list based index each category ids of documents belonging category. shuffle create bloom filter containing of entries viewed particular user traverse index using iterator, randomly select items using bloom filter pick not viewed items.
i recommend hash table implementation. ensure constant time ups. can implement technique known linear probing. linked list awful implementation, because eat o(n)
on search time.
if constraint has relational database, can use things such memcache (fb uses this), keep friend of friend problem.
Comments
Post a Comment