c - Map the 32bit integers into 32 bins, with 1,2,4..2^31 consecutive integers per bin -
for 32 bit integer, divide 32 bins of consecutive integers such there twice many integers in each successive bin. first bin contains 0, second 0..1, etc 0..2^31-1.
the fastest algorithm come with, given 32 bit integer i, 5 cycles on i7 (bit scan 3 cycles):
// bin number of leading zeroes, , clear msb item bin_index = bsr(i) item = ^ (1 << bin_index)
or equivalently (well stores items 0..2^(32-1) in bin 0 , 0 in bin 31, doesn't matter):
// bin number of trailing zeroes, , shift down many bits + 1 bin_index = bsf(i) item = >> (bin_index + 1)
in each case bin index encoded number of leading/trailing 0 bits, 1 separate them item number. same leading or trailing ones , 0 separate them. neither works i=0, that's not important.
the mapping between integers , bins/items can arbitrary, long twice many consecutive integers end in each successive bin , total number of integers in bins sums 2^32-1. can think of more efficient algorithm bin 32 integers on i7? keep in mind i7 superscalar operations don't depend on each other can execute in parallel, throughput each instruction type.
you can improve algorithm trying sort data first before counting zeros.
for example , compare 2^31 first , if greater put in bin, otherwise go on , count trailing zeros. have half data set put bin in 2 instructions...probably 2 cycles. other half take bit longer net result improvement. can optimize further following line if thought.
i guess dependent on efficiency ofbranch prediction.
Comments
Post a Comment