performance - Could I use a faster data structure than a tree for this? -

May 15, 2014

i have binary decision tree. takes inputs array of floats, , each branch node splits on input index , value taking me leaf.

i'm performing massive number of lookups on tree (about 17% of execution time according performance analysis (edit: having optimised other areas it's @ 40%)), , wondering if could/should using different data structure improve lookup speed.

some kind of hash table can't used, inputs not map directly leaf node, wondering had suggesting methods , data-structures use in place of tree (or as?) improve lookup speeds.

memory concern, less of concern speed.

code written in c#, method applied.

edit: there's bit code post, i'll give more detail tree.

the tree generated using information gain calculations, it's not 50/50 split, split value float value. single input split multiple times increasing resolution on input.

i posted question performance of iterator here:

micro optimisations iterating through tree in c#

but think might need @ data structure improve performance further.

i'm aiming performance possible here. i'm working on new method of machine learning, , tree grows using feedback loop. process i'm working on, estimate it'll running several months, few % saving here , there massive. ultimate goal speed without using memory.

if understand correctly, have floating point ranges have mapped decision. this:

       x <= 0.0      : decision  0.0 < x <= 0.5      : decision b  0.5 < x <= 0.6      : decision c  0.6 < x             : decision d

a binary tree pretty way handle that. long tree balanced , input values evenly distributed across ranges, can expect o(log₂ n) comparisons, n number of possible decisions.

if tree not balanced, doing far more comparisons necessary. in worst case: o(n). @ trees , see how deep are. if same tree used again , again, cost spent rebalancing once may amortized on many lookups.

if input values not evenly distributed (and know ahead of time), might want special-case order of comparisons common cases detected early. can manipulating tree or adding special cases in code before checking tree.

if you've exhausted algorithmic improvements , still need optimize, might data structure better locality general binary tree. example, put partition boundaries contiguous array , perform binary search on it. (and, if array isn't long, might try linear search on array may friendlier cache , branch prediction.)

lastly, i'd consider building coarse index gives headstart tree (or array). example, use few of significant bits of input value index , see if can cut off first few layers of tree. may more might imagine, skipped comparisons have low chance of getting correct branch predictions.

Search This Blog

DIs

performance - Could I use a faster data structure than a tree for this? -

Comments

Post a Comment

Popular posts from this blog

css - Text drops down with smaller window -

php - cannot display multiple markers in google maps v3 from traceroute result -

php - Boolean search on database with 5 million rows, very slow -