Re: Force a merge join?

Поиск
Список
Период
Сортировка
От Doug Fields
Тема Re: Force a merge join?
Дата
Msg-id 5.1.0.14.2.20020518193752.01f3dc00@mail.pexicom.com
обсуждение исходный текст
Ответ на Re: Force a merge join?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Force a merge join?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
> >     308
>
>As opposed to the pg_stats estimate of 189 ... not too bad, really.
>Is the most-common-values distribution shown in the pg_stats output
>reasonably correct?  (Specifically, 38 has nearly 40% of the entries,
>192 another 12%, and everything else 1.5% or less)

The output is actually pretty in terms of which should be in the top 10,
but not which exactly:

pexitest=# select list_id, count(*) from test_list_entries group by list_id
order by count(*) DESC;
  list_id | count
---------+--------
      192 | 173290
      330 |  16174
      501 |  14054
      369 |  12659
      229 |  12654
      332 |  11429
      342 |  10982
      454 |  10404
      493 |   9835
      424 |   9778
      460 |   9707
       38 |   9454
      331 |   9355
      323 |   9232
      319 |   9164

So that correlates pretty well with it's guesses as to the top 10:
{38,192,369,330,332,501,229,493,319,424}

But not so well as to their relative distributions:
{0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333}

Some other stats:

select avg(count) from (select list_id, count(*) from test_list_entries
group by list_id order by count(*) DESC) as a;
        avg
-----------------
  2599.2727272727
(1 row)

pexitest=# select stddev(count) from (select list_id, count(*) from
test_list_entries group by list_id order by count(*) DESC) as a;
       stddev
------------------
  10160.4314402693
(1 row)

>Another question is exactly what version you are running.  I tried
>plugging the stats values you gave into pg_statistic by hand, and
>got this plan from current sources:

I'm running 7.2.1 as packaged and distributed in the Debian/woody 7.2.1-2
distribution.

>ie, it's estimating about 1800 matches per list_id value, which seems
>pretty reasonable given that it knows none of these values are in the
>most_common list.  Now I don't see anything in the CVS logs to suggest
>that the estimation of this query would've changed since late in 7.2
>beta cycle, so I'm confused why you don't get similar results.

It would be nice if I could get it to use a query such as the ones I gave
above to put exact values into the analysis. Or, if I could tell it to do a
more detailed sampling during ANALYZE. I could also tell it to keep more
than the top 10 in the statistics table (SET STATISTICS), but I'm not sure
what it would buy me, other than forcing a larger sample (but not knowing
how much larger).

How much would I slow the ANALYZE statement, and more importantly, the
query optimizer, if I told it to keep statistics on the top 200 instead of
the default 10 values?

In the mean time, I've surrounded my code with SET ENABLE_NESTLOOP=OFF and
ON blocks, which should force merge joins.

I appreciate your insight on this matter and others Tom. Thanks!

Cheers,

Doug


В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Force a merge join?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Force a merge join?