Re: Idea for the statistics collector

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Idea for the statistics collector
Дата
Msg-id 200504200244.j3K2iva26680@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: Idea for the statistics collector  (Doug Fields <dfields@pexicom.com>)
Список pgsql-general
Added to TODO list:

    * Log queries where the optimizer row estimates were dramatically
      different from the number of rows actually found (?)

---------------------------------------------------------------------------

Doug Fields wrote:
>
> >Tom Lane wrote:
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > Martijn van Oosterhout wrote:
> > > >> Firstly, I was only thinking of going for the basic nodes (Index
> > Scan, Seq
> > > >> Scan, Distinct). Other types have far more variables. Secondly, even
> > if you
> > > >> only count, it's useful. For example, if it tells you that the
> > planner is
> > > >> off by a factor of 10 more than 75% of the time, that's useful
> > information
> > > >> independant of what the actual variables are.
> > >
> > > And reduce the usefulness even more ;-).  As a pure stats-gathering
> > > exercise it might be worth doing, but not if you only log the failure
> > > cases.  How will you know how well you are doing if you take a
> > > biased-by-design sample?
>
> Personally, given that it seems like at least once or twice a day someone
> asks about performance or "why isn't my index being used" and other stuff -
> I think doing this would be a great idea.
>
> Perhaps not necessarily in the full-fledged way, but creating a sort of
> "ANALYZE log," wherein it logs the optimizer's estimate of a query and the
> actual results of a query, for every query. This, of course, could be
> enableable/disableable on a per-connection basis, per-table basis (like
> OIDs), or whatever other basis makes life easiest to the developers.
>
> Then, when the next ANALYZE is run, it could do it's usual analysis, and
> apply some additional heuristics based upon what it learns from the
> "ANALYZE log," possibly to do several things:
>
> 1) Automatically increase/decrease the SET STATISTICS information included
> in the analyze, for example, increasing it as a table grows larger and the
> "randomness" grows less than linearly with size (e.g., if you have 50 or 60
> groups in a 1,000,000 row table, that certainly needs a higher SET
> STATISTICS and I do it on my tables).
> 2) Have an additional value on the statistics table called the
> "index_heuristic" or "random_page_adjustment_heuristic" which when 1 does
> nothing, but otherwise modifies the cost of using an index/seq scan by that
> factor - and don't ever change this more than a few percent each ANALYZE
> 3) Flags in a second log (maybe the regular log) really bad query estimates
> - let it do an analysis of the queries and flag anything two or three std
> deviations outside.
>
> Now, I suggest all this stuff in the name of usability and
> self-maintainability. Unfortunately, I don't have the wherewithal to
> actually assist in development.
>
> Another possibility is to put "use_seq_scan" default to OFF, or whatever
> the parameter is (I did my optimizing a while ago so it's fading), so that
> if there's an index, it will use it, regardless - as this seems to be what
> the great majority of people expect to happen. And/or add this to a FAQ,
> and let us all reply "see http://.../indexfaq.html." :)
>
> Cheers,
>
> Doug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

В списке pgsql-general по дате отправления:

Предыдущее
От: Lorenzo Thurman
Дата:
Сообщение: Re: pg_dump fails
Следующее
От: "vinita bansal"
Дата:
Сообщение: Re: reorder table data