Обсуждение: BUG #17896: xmlagg exponentially slower than string_agg equivalent

Поиск
Список
Период
Сортировка

BUG #17896: xmlagg exponentially slower than string_agg equivalent

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17896
Logged by:          Paul Wehr
Email address:      sf_postgresql@informationsoftworks.com
PostgreSQL version: 14.6
Operating system:   Debian GNU/Linux 11 (bullseye)
Description:

Not a bug in terms of functionality, but an interesting difference in
performance.  If nothing else hopefully documenting this work-around might
help anyone else that may be aggregating xmlelements on this scale.

Using xmlagg(xml) on 60,000 rows of xml takes 16.3s on my hardware.  If I
replace "xmlagg(xml)" with "string_agg(xml::text)::xml", I can get the same
result in 0.051s.  (see script below). I have confirmed this behavior on
other hardware, and in versions 11.19, 13.7 and 14.6.

Also, while string_agg() seems to scale linearly (O(n)), xmlagg() seems to
scale somewhat steeper (O(n^2)?).  Running the same script on various second
parameter values in the generate_series() function produced these results:

       |                |   string_agg()
 Rows  |  xmlagg(), ms  | work-around, ms
-------+----------------+------------------
 10000 |      314.44    |      14.207
 20000 |    1111.104    |      23.754
 30000 |    2558.706    |      37.565
 40000 |    5306.631    |      51.021
 50000 |   10178.496    |      61.928
 60000 |   16306.651    |      75.146
 70000 |   23853.331    |      93.529
 80000 |   32243.908    |     105.318


-------- psql test script output ----------------

select version();
                                                           version
                                                

-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

create temporary table xmlsample as
select xmlelement(name test, 'test') as xml
from generate_series(1,60000) as i (i)
;
SELECT 60000
explain analyze select xmlagg(xml) as xml from xmlsample;
                                                     QUERY PLAN
                                        
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1034.10..1034.11 rows=1 width=32) (actual
time=16304.009..16304.010 rows=1 loops=1)
   ->  Seq Scan on xmlsample  (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.021..12.754 rows=60000 loops=1)
 Planning Time: 0.351 ms
 Execution Time: 16306.651 ms
(4 rows)

explain analyze select string_agg(xml::text, '')::xml as xml from
xmlsample;
                                                     QUERY PLAN
                                       
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1034.10..1034.12 rows=1 width=32) (actual
time=63.076..63.078 rows=1 loops=1)
   ->  Seq Scan on xmlsample  (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.010..5.591 rows=60000 loops=1)
 Planning Time: 0.098 ms
 Execution Time: 75.146 ms
(4 rows)


Re: BUG #17896: xmlagg exponentially slower than string_agg equivalent

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> Also, while string_agg() seems to scale linearly (O(n)), xmlagg() seems to
> scale somewhat steeper (O(n^2)?).

Yeah, string_agg was optimized long ago to avoid repeat data copying,
but xmlagg hasn't been.

It looks like xmlagg doesn't make any direct use of libxml, which suggests
that this wouldn't be terribly hard to fix if somebody were motivated
to try.  The xml type is a bit of a development backwater though ...

            regards, tom lane