Обсуждение: expensive function in select list vs limit clause

Поиск
Список
Период
Сортировка

expensive function in select list vs limit clause

От
Chris Mair
Дата:
Hi,

I've found a (simple) situation where the planner does something I don't understand.

Below is a complete test case followed by output.

 From the timings it appears that in the second explain analyze query a function
call in the select list (expensive()) is evaluated in the sequential scan node
*for each* row in big, despite the use of limit.

I would have expected expensive() to be evaluated only for the ten rows
in the result set. Hence the second explain analyze query shouldn't be more
expensive than the first one.

My trust in Postgres' planner goes so far as I feel the planner is right and there
must be a reason for this :)

Could someone help me understand this behaviour?

Thanks & Bye,
Chris.



-- ***************************************************************************

select version();

-- setup: create a time wasting function and a table with 1M rows

create function expensive() returns double precision as
$$
     begin
         for i in 1 .. 15000 loop
         end loop;
         return random();
     end;
$$ language 'plpgsql';

create table big as select random() as r from generate_series(1, 1000000);
analyze big;

\timing on

-- benchmark expensive(): one call to expensive takes about 0.3 ms => OK

do $$ begin for i in 1 .. 1000 loop perform expensive(); end loop; end; $$;

-- find the ten smallest values in big: takes ~ 0.18s => OK

explain analyze select r from big order by r offset 0 limit 10;

-- now do the same, but add an expensive() column to the result:
-- takes ~ 29s => WHY?

explain analyze select r, expensive() from big order by r offset 0 limit 10;

-- clean up :)

\timing off
drop function expensive();
drop table big;

-- ***************************************************************************

                                                    version
--------------------------------------------------------------------------------------------------------------
  PostgreSQL 9.5.4 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM version 7.0.2 (clang-700.1.81), 64-bit
(1 row)

Time: 0.386 ms
CREATE FUNCTION
Time: 0.456 ms
SELECT 1000000
Time: 466.814 ms
ANALYZE
Time: 73.770 ms
Timing is on.
DO
Time: 33.922 ms
                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=36034.64..36034.67 rows=10 width=8) (actual time=182.361..182.363 rows=10 loops=1)
    ->  Sort  (cost=36034.64..38534.64 rows=1000000 width=8) (actual time=182.360..182.361 rows=10 loops=1)
          Sort Key: r
          Sort Method: top-N heapsort  Memory: 25kB
          ->  Seq Scan on big  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.022..99.777 rows=1000000
loops=1)
  Planning time: 0.070 ms
  Execution time: 182.377 ms
(7 rows)

Time: 182.689 ms
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=286034.64..286034.67 rows=10 width=8) (actual time=28932.311..28932.314 rows=10 loops=1)
    ->  Sort  (cost=286034.64..288534.64 rows=1000000 width=8) (actual time=28932.309..28932.310 rows=10 loops=1)
          Sort Key: r
          Sort Method: top-N heapsort  Memory: 25kB
          ->  Seq Scan on big  (cost=0.00..264425.00 rows=1000000 width=8) (actual time=0.062..28822.520 rows=1000000
loops=1)
  Planning time: 0.038 ms
  Execution time: 28932.339 ms
(7 rows)

Time: 28932.908 ms
Timing is off.
DROP FUNCTION
DROP TABLE

-- ***************************************************************************



Re: expensive function in select list vs limit clause

От
Tom Lane
Дата:
Chris Mair <chris@1006.org> writes:
>  From the timings it appears that in the second explain analyze query a function
> call in the select list (expensive()) is evaluated in the sequential scan node
> *for each* row in big, despite the use of limit.

According to the SQL standard, the SELECT list is evaluated before ORDER
BY, so if you need an explicit sort step the function is going to get
calculated first.  This is obviously necessary if the function is used
as the sort key, but otherwise it's possible to be smarter.  We were not
smarter before 9.6 though.  You might find this commit message informative:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=9118d03a8

            regards, tom lane


Re: expensive function in select list vs limit clause

От
Chris Mair
Дата:
> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=9118d03a8

Hi,

thanks!

I've just tested with 9.6 and the test runs fast with or without expensive().

So the above patch does indeed improve this case a lot!

Bye,
Chris.