performance - Could I use a faster data structure than a tree for this? -
i have binary decision tree. takes inputs array of floats, , each branch node splits on input index , value taking me leaf.
i'm performing massive number of lookups on tree (about 17% of execution time according performance analysis (edit: having optimised other areas it's @ 40%)), , wondering if could/should using different data structure improve lookup speed.
some kind of hash table can't used, inputs not map directly leaf node, wondering had suggesting methods , data-structures use in place of tree (or as?) improve lookup speeds.
memory concern, less of concern speed.
code written in c#, method applied.
edit: there's bit code post, i'll give more detail tree.
the tree generated using information gain calculations, it's not 50/50 split, split value float value. single input split multiple times increasing resolution on input.
i posted question performance of iterator here:
micro optimisations iterating through tree in c#
but think might need @ data structure improve performance further.
i'm aiming performance possible here. i'm working on new method of machine learning, , tree grows using feedback loop. process i'm working on, estimate it'll running several months, few % saving here , there massive. ultimate goal speed without using memory.
if understand correctly, have floating point ranges have mapped decision. this:
x <= 0.0 : decision 0.0 < x <= 0.5 : decision b 0.5 < x <= 0.6 : decision c 0.6 < x : decision d
a binary tree pretty way handle that. long tree balanced , input values evenly distributed across ranges, can expect o(log2 n) comparisons, n number of possible decisions.
if tree not balanced, doing far more comparisons necessary. in worst case: o(n). @ trees , see how deep are. if same tree used again , again, cost spent rebalancing once may amortized on many lookups.
if input values not evenly distributed (and know ahead of time), might want special-case order of comparisons common cases detected early. can manipulating tree or adding special cases in code before checking tree.
if you've exhausted algorithmic improvements , still need optimize, might data structure better locality general binary tree. example, put partition boundaries contiguous array , perform binary search on it. (and, if array isn't long, might try linear search on array may friendlier cache , branch prediction.)
lastly, i'd consider building coarse index gives headstart tree (or array). example, use few of significant bits of input value index , see if can cut off first few layers of tree. may more might imagine, skipped comparisons have low chance of getting correct branch predictions.
Comments
Post a Comment