distribution - entropy estimation using histogram of normal data vs direct formula (matlab) -
let's assume have drawn n=10000
samples of standard normal distribution.
now want calculate entropy using histograms calculate probabilities.
1) calculate probabilities (for example using matlab)
[p,x] = hist(samples,binnumbers); area = (x(2)-x(1))*sum(p); p = p/area;
(binnumbers determined due rule)
2) estimate entropy
h = -sum(p.*log2(p))
which gives 58.6488
now when use direct formula calculate entropy of normal data
h = 0.5*log2(2*pi*exp(1)) = 2.0471
what do wrong when using histograms + entropy formula? thank help!!
you missing dp
term in sum
dp = (x(2)-x(1)); area = sum(p)*dp; h = -sum( (p*dp) * log2(p) );
this should bring close enough...
ps,
careful when take log2(p)
might have empty bins. might find nansum
useful.
Comments
Post a Comment