algorithm - Random exhaustive (non-repeating) selection from a large pool of entries -

April 15, 2011

suppose have large (300-500k) collection of text documents stored in relational database. each document can belong 1 or more (up six) categories. need users able randomly select documents in specific category single entity never repeated, how stumbleupon works.

i don't see way implement using slow not in queries large amount of users , documents, figured might need implement custom data structure purpose. perhaps there paper describing algorithm might adapted needs?

currently i'm considering following approach:

read entries database create linked list based index each category ids of documents belonging category. shuffle create bloom filter containing of entries viewed particular user traverse index using iterator, randomly select items using bloom filter pick not viewed items.

i recommend hash table implementation. ensure constant time ups. can implement technique known linear probing. linked list awful implementation, because eat o(n) on search time.

if constraint has relational database, can use things such memcache (fb uses this), keep friend of friend problem.

Search This Blog

DIs

algorithm - Random exhaustive (non-repeating) selection from a large pool of entries -

Comments

Post a Comment

Popular posts from this blog

css - Text drops down with smaller window -

php - cannot display multiple markers in google maps v3 from traceroute result -

php - Boolean search on database with 5 million rows, very slow -