> I'm trying to use random_zipfian() for benchmarking of skewed data sets, > and I ran head-first into an issue with rather excessive CPU costs. > [...] This happens because generalizedHarmonicNumber() does this: > > for (i = n; i > 1; i--) > ans += pow(i, -s); > > where n happens to be 1000000000 (range passed to random_zipfian), so > the loop takes quite a bit of time.
If you find a better formula for the harmonic number, you are welcome and probably get your name on it:-)
There are pretty good approximations for s > 1.0 using Riemann zeta function and Euler derived a formula for the s = 1 case.
I also noticed that i is int in this function, but n is int64. That seems like an oversight.