In PG12, query with float calculations is slower than PG11

Поиск
Список
Период
Сортировка
От keisuke kuroda
Тема In PG12, query with float calculations is slower than PG11
Дата
Msg-id CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
обсуждение исходный текст
Ответы Re: In PG12, query with float calculations is slower than PG11  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

I am testing performance both PG12 and PG11.
I found the case of performance degradation in PG12.

Amit Langote help me to analyze and to create patch.
Thanks!

* environment

CentOS Linux release 7.6.1810 (Core)
postgresql 12.1
postgresql 11.6

* postgresql.conf

shared_buffers = 2048MB
max_parallel_workers_per_gather = 0
work_mem = '64MB'
jit = off

* test case

CREATE TABLE realtest(a real, b real, c real, d real, e real);
INSERT INTO realtest SELECT i,i,i,i,i FROM generate_series(0,10000000) AS i;

EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
 select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
 from realtest;

* result

 PG12.1 5878.389 ms
 PG11.6 4533.554 ms

** PostgreSQL 12.1

pgbench=#  EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
  select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
  from realtest;
                                                                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.realtest  (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.040..5195.328 rows=10000001 loops=1)
   Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
   Buffers: shared hit=63695
 Planning Time: 0.051 ms
 Execution Time: 5878.389 ms
(5 行)

Samples: 6K of event 'cpu-clock', Event count (approx.): 1577750000
Overhead  Command   Shared Object      Symbol
  25.48%  postgres  postgres           [.] ExecInterpExpr
★18.65%  postgres  libc-2.17.so       [.] __isinf
  14.36%  postgres  postgres           [.] float84mul
   8.54%  postgres  [vdso]             [.] __vdso_clock_gettime
   4.02%  postgres  postgres           [.] ExecScan
   3.69%  postgres  postgres           [.] tts_buffer_heap_getsomeattrs
   2.63%  postgres  libc-2.17.so       [.] __clock_gettime
   2.55%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
   2.00%  postgres  postgres           [.] heapgettup_pagemode

** PostgreSQL 11.6

pgbench=#  EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
  select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
  from realtest;
                                                                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.realtest  (cost=0.00..288697.59 rows=10000115 width=40) (actual time=0.012..3845.480 rows=10000001 loops=1)
   Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
   Buffers: shared hit=63695
 Planning Time: 0.033 ms
 Execution Time: 4533.554 ms
(5 行)

Samples: 4K of event 'cpu-clock', Event count (approx.): 1192000000
Overhead  Command   Shared Object      Symbol
  32.30%  postgres  postgres           [.] ExecInterpExpr
  14.95%  postgres  postgres           [.] float84mul
  10.57%  postgres  [vdso]             [.] __vdso_clock_gettime
★ 6.84%  postgres  libc-2.17.so       [.] __isinf
   3.96%  postgres  postgres           [.] ExecScan
   3.50%  postgres  libc-2.17.so       [.] __clock_gettime
   3.31%  postgres  postgres           [.] heap_getnext
   3.08%  postgres  postgres           [.] HeapTupleSatisfiesMVCC
   2.77%  postgres  postgres           [.] slot_deform_tuple
   2.37%  postgres  postgres           [.] ExecProcNodeInstr
   2.08%  postgres  postgres           [.] standard_ExecutorRun

* cause

Obviously, even in common cases where no overflow occurs,
you can tell that PG 12 is performing isinf() 3 times on every call of
float8_mul() once for each of val1, val2, result where as PG 11
is performing only once for result.

That's because check_float8_val() (in PG 12) is a function
whose arguments must be evaluated before
it is called (it is inline, but that's irrelevant),
whereas CHECKFLOATVAL() (in PG11) is a macro
whose arguments are only substituted into its body.

By the way, this change of float8mul() implementation is
mostly due to the following commit in PG 12 development cycle:
commit 6bf0bc842bd75877e31727eb559c6a69e237f831

Especially, the following diff:

@@ -894,13 +746,8 @@ float8mul(PG_FUNCTION_ARGS)  {
    float8      arg1 = PG_GETARG_FLOAT8(0);
    float8      arg2 = PG_GETARG_FLOAT8(1);
-   float8      result;
-
-   result = arg1 * arg2;

-   CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2),
-                 arg1 == 0 || arg2 == 0);
-   PG_RETURN_FLOAT8(result);
+   PG_RETURN_FLOAT8(float8_mul(arg1, arg2));
 }

* patch

This patch uses MACRO which was used by PG11.
I tried attached patch, which can be applied to PG 12 source and performed a benchmark:

 PG12.1 5878.389 ms
 PG11.6 4533.554 ms

 PG12.1 + Patch 4679.162 ms

** PostgreSQL 12.1 + Patch

postgres=# EXPLAIN (ANALYZE on, VERBOSE on, BUFFERS on)
 select (2 * a) , (2 * b) , (2 * c), (2 * d),  (2 * e)
 from realtest;
                                                                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.realtest  (cost=0.00..307328.38 rows=10828150 width=40) (actual time=0.012..4009.012 rows=10000001 loops=1)
   Output: ('2'::double precision * a), ('2'::double precision * b), ('2'::double precision * c), ('2'::double precision * d), ('2'::double precision * e)
   Buffers: shared hit=63695
 Planning Time: 0.038 ms
 Execution Time: 4679.162 ms
(5 rows)

Samples: 5K of event 'cpu-clock', Event count (approx.): 1376750000
Overhead  Command   Shared Object      Symbol
  31.43%  postgres  postgres           [.] ExecInterpExpr
  14.24%  postgres  postgres           [.] float84mul
  10.40%  postgres  [vdso]             [.] __vdso_clock_gettime
★ 5.41%  postgres  libc-2.17.so       [.] __isinf
   4.63%  postgres  postgres           [.] tts_buffer_heap_getsomeattrs
   4.03%  postgres  postgres           [.] ExecScan
   3.54%  postgres  libc-2.17.so       [.] __clock_gettime
   3.12%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
   2.36%  postgres  postgres           [.] heap_getnextslot
   2.16%  postgres  postgres           [.] heapgettup_pagemode
   2.09%  postgres  postgres           [.] standard_ExecutorRun
   2.07%  postgres  postgres           [.] SeqNext
   2.03%  postgres  postgres           [.] ExecProcNodeInstr
   2.03%  postgres  postgres           [.] tts_virtual_clear

PG 12 is still slower compared to PG 11, but the __isinf() situation is better with the patch.

Best Regards,
Keisuke Kuroda
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Make ringbuffer threshold and ringbuffer sizes configurable?
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager