Обсуждение: Re: [HACKERS] What about LIMIT in SELECT ?

Поиск
Список
Период
Сортировка

Re: [HACKERS] What about LIMIT in SELECT ?

От
Hannu Krosing
Дата:
Jan Wieck wrote:

>    And I got the time to hack around about this.
>
   <...
    a lovely explanation of a patch to achiev 60X speedup
    for a typical web query
   ...>

>    The speedup for the cursor/fetch scenario  is  so  impressive
>    that  I'll  create  a  post 6.4 patch. I don't want it in 6.4
>    because there is absolutely no query in the whole  regression
>    test,  where  it  suppresses  the  sort  node.

Good, then it works as expected ;)

More seriously, it is not within powers of current regression test
framework to test speed improvements (only the case where
performance-wise bad implementation will actually crash the backend,
as in the cnfify problem, but AFAIK we dont test for those now)

>   So  we  have absolutely no check that it doesn't break anything.

If it did pass the regression, then IMHO it did not break anything.

I would vote for putting it in (maybe with a
'set fix_optimiser_stupidity on' safeguard to enable it). I see no
reason to postpone it to 6.4.1 and force almost everybody to first
patch their copy and then upgrade very soon.

I would even go far enough to call it a bugfix, as it does not really
introduce any new functionality only fixes some existing functionality
so that much bigger databases can be actually used.

I would  compare it in this sense to finding the places where
username/password get truncated below their actual values in pg_passwd
;)

---------------
 Hannu Krosing

Re: [HACKERS] What about LIMIT in SELECT ?

От
jwieck@debis.com (Jan Wieck)
Дата:
Hannu Krosing wrote:

> Jan Wieck wrote:
> >    The speedup for the cursor/fetch scenario  is  so  impressive
> >    that  I'll  create  a  post 6.4 patch. I don't want it in 6.4
> >    because there is absolutely no query in the whole  regression
> >    test,  where  it  suppresses  the  sort  node.
>
> Good, then it works as expected ;)
>
> More seriously, it is not within powers of current regression test
> framework to test speed improvements (only the case where
> performance-wise bad implementation will actually crash the backend,
> as in the cnfify problem, but AFAIK we dont test for those now)
>
> >   So  we  have absolutely no check that it doesn't break anything.
>
> If it did pass the regression, then IMHO it did not break anything.

    Thats  the  point.  The  check  if  the sort node is required
    returns TRUE for  ALL  queries  of  the  regression.  So  the
    behaviour when it returns FALSE is absolutely not tested.

>
> I would vote for putting it in (maybe with a
> 'set fix_optimiser_stupidity on' safeguard to enable it). I see no
> reason to postpone it to 6.4.1 and force almost everybody to first
> patch their copy and then upgrade very soon.
>
> I would even go far enough to call it a bugfix, as it does not really
> introduce any new functionality only fixes some existing functionality
> so that much bigger databases can be actually used.

    I can't call it a bugfix because it is only a performance win
    in some situations. And I feel the risk is too  high  to  put
    untested  code  into  the  backend  at BETA2 time. The max we
    should do is to take this one  and  the  LIMIT  thing  (maybe
    implemented  as  I  suggested  lately),  and  put  out a Web-
    Performance-Release at the same time we release 6.4.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] What about LIMIT in SELECT ?

От
Bruce Momjian
Дата:
> I would even go far enough to call it a bugfix, as it does not really
> introduce any new functionality only fixes some existing functionality
> so that much bigger databases can be actually used.
>
> I would  compare it in this sense to finding the places where
> username/password get truncated below their actual values in pg_passwd
> ;)

We just can't test is on the wide variation of people's queries, though
passing the regression test is a good indication it is OK.

However, we are very close to release.  Yes, I know it is a pain to
wait, but we are not even done discussion all the options yet, and I
still have the cnfify fix to look at.

I am sure we will have post 6.4 releases, just like we have everyone
runing 6.3.2 rather than 6.3.  There will be other now-undiscovered
fixes in post 6.4 cleanup releases, and we will hopefully have a _full_
solution to the problem at that point.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [HACKERS] What about LIMIT in SELECT ?

От
Hannu Krosing
Дата:
Jan Wieck wrote:
>
> Hannu Krosing wrote:
>
> > Jan Wieck wrote:
> > >    The speedup for the cursor/fetch scenario  is  so  impressive
> > >    that  I'll  create  a  post 6.4 patch. I don't want it in 6.4
> > >    because there is absolutely no query in the whole  regression
> > >    test,  where  it  suppresses  the  sort  node.
> >
> > Good, then it works as expected ;)
> >
> > More seriously, it is not within powers of current regression test
> > framework to test speed improvements (only the case where
> > performance-wise bad implementation will actually crash the backend,
> > as in the cnfify problem, but AFAIK we dont test for those now)
> >
> > >   So  we  have absolutely no check that it doesn't break anything.
> >
> > If it did pass the regression, then IMHO it did not break anything.
>
>     Thats  the  point.  The  check  if  the sort node is required
>     returns TRUE for  ALL  queries  of  the  regression.  So  the
>     behaviour when it returns FALSE is absolutely not tested.

The only way to find out is to make a new test, maybe by comparing

select * from t where key > 1 order by key;

where sort node can be dropped

and

select * from t where (key+1) > 2 order by key;

where it probably can't (I guess the optimiser is currently not smart
enough)

>     I can't call it a bugfix because it is only a performance win
>     in some situations.

In the extreme case the situation can be exhaustion of swap and disk
space resulting in a frozen computer, just trying to get 10 first rows
from a table. Its not exactly a bug, but it's not the expected
behaviour either.

>     And I feel the risk is too  high  to  put
>     untested  code  into  the  backend  at BETA2 time. The max we
>     should do is to take this one  and  the  LIMIT  thing  (maybe
>     implemented  as  I  suggested  lately),  and  put  out a Web-
>     Performance-Release at the same time we release 6.4.

Or perhaps have the patches in /contrib in 6.4 distribution
(preferrably with an option to configure to apply them ;)

-----------------
Hannu

Re: [HACKERS] What about LIMIT in SELECT ?

От
"Thomas G. Lockhart"
Дата:
> More seriously, it is not within powers of current regression test
> framework to test speed improvements (only the case where
> performance-wise bad implementation will actually crash the backend,
> as in the cnfify problem, but AFAIK we dont test for those now)

Actually, I keep informal track of run-times for the entire regression
test, which gives me an indication of how things are going. For much of
the v6.4 development, I was seeing runtimes of around 2:30 on my system
(a bit less, a bit more depending on the phase of the moon).

The runtime is currently up at 3:48 (3:40 with egcs-1.1b and
-mpentiumpro rather than the usual gcc-2.7.1 and -m486). I am hoping
that most of that recent increase in runtime is from the recent
additions of Jan's rules and embedded programming language tests.

Although the times aren't directly comparable with older releases (e.g.
we used to have char2,4,8,16 tests and we now have Jan's new tests)
there has been a distinct downward trend in runtimes.

But you're correct in that these timing tests are fairly insensitive to
major improvements in only one query scenerio, since that makes a
relatively small change in the total runtime.

                         - Tom