Jan Urbański <j.urbanski@students.mimuw.edu.pl> writes:
> Tom Lane wrote:
>> I came across this bit in ts_typanalyze.c:
>>
>> /* We want statistic_target * 100 lexemes in the MCELEM array */
>> num_mcelem = stats->attr->attstattarget * 100;
>>
>> I wonder whether the multiplier here should be changed?
> The origin of that bit is this post:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00556.php
> and the following few downthread ones.
> If we bump the default statistics target 10 times, then changing the
> multiplier to 10 seems the right thing to do.
OK, will do.
> Only thing that needs
> caution is the frequency of pruning we do in the Lossy Counting
> algorithm, that IIRC is correlated with the desired target length of the
> MCELEM array.
Right below that we have
/* * We set bucket width equal to the target number of result lexemes. * This is probably about right but perhaps might
needto be scaled * up or down a bit? */bucket_width = num_mcelem;
so it should track automatically. AFAICS the argument in the above
thread that this is an appropriate pruning distance holds good
regardless of just how we obtain the target mcelem count.
> BTW: I've been occupied with other things and might have missed some
> discussions, but at some point it has been considered to use Lossy
> Counting to gather statistics from regular columns, not only tsvectors.
> Wouldn't this help the performance hit ANALYZE takes from upping
> default_stats_target?
Perhaps, but it's not likely to get done for 8.4 ...
regards, tom lane