Обсуждение: to_jsonb performance on array aggregated correlated subqueries

Поиск

Список

Период

Сортировка

to_jsonb performance on array aggregated correlated subqueries

От

Nico Heller

Дата:

12 августа 2022 г., 21:49:58

Good day,

consider the following query:

WITH aggregation(
     SELECT
            a.*,
           (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
           (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
           (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
           (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
     FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

Imagine that for each "a" there exists between 5-100 "b", "c", "d" and 
"e" which makes the result of this pretty big (worst case: around 300kb 
when saved to a text file).
I noticed that adding the "to_jsonb" increases the query time by 100%, 
from 9-10ms to 17-23ms on average.
This may not seem slow at all but this query has another issue: on an 
AWS Aurora Serverless V2 instance we are running into a RAM usage of 
around 30-50 GB compared to < 10 GB when using a simple LEFT JOINed 
query when under high load (> 1000 queries / sec). Furthermore the CPU 
usage is quite high.

Is there anything I could improve? I am open for other solutions but I 
am wondering if I ran into an edge case of "to_jsonb" for "anonymous 
records" (these are just rows without a defined UDT) - this is just a 
wild guess though.
I am mostly looking to decrease the load (CPU and memory) on Postgres 
itself. Furthermore I would like to know why the memory usage is so 
significant. Any tips on how to analyze this issue are appreciated as 
well -  my knowledge is limited to being average at interpreting EXPLAIN 
ANALYZE results.

Here's a succinct list of the why's, what I have found out so far and 
solution I already tried/ don't want to consider:

- LEFT JOINing potentially creates a huge resultset because of the 
cartesian product, thats a nono
- not using "to_jsonb" is sadly also not possible as Postgres' array + 
record syntax is very unfriendly and hard to parse (it's barely 
documented if at all and the quoting rules are cumbersome, furthermore I 
lack column names in the array which would make the parsing sensitive to 
future table changes and thus cumbersome to maintain) in my application
- I know I could solve this with a separate query for a,b,c,d and e 
while "joinining" the result in my application, but I am looking for 
another way to do this (bear with me, treat this as an academic question :))
- I am using "to_jsonb" to simply map the result to my data model via a 
json mapper
- EXPLAIN ANALYZE is not showing anything special when using "to_jsonb" 
vs. not using it, the outermost (hash) join just takes more time - is 
there a more granular EXPLAIN that shows me the runtime of functions 
like "to_jsonb"?
- I tried an approach where b,c,d,e where array columns of UDTs: UDTs 
are not well supported by my application stack (JDBC) and are generally 
undesireable for me (because of a lack of migration possibilities)
- I don't want to duplicate my data into another table (e.g. that has 
jsonb columns)
- MATERIALIZED VIEWS are also undesirable as the manual update, its 
update is non-incremental which would make a refresh on a big data set 
take a long time
- split the query into chunks to reduce the IN()-statement list size 
makes no measurable difference
- I don't want to use JSONB columns for b,c,d and e because future 
changes of b,c,d or e's structure (e.g. new fields, changing a datatype) 
are harder to achieve with JSONB and it lacks constraint checks on 
insert (e.g. not null on column b.xy)

Kind regards and thank you for your time,
Nico Heller

P.S: Sorry for the long list of "I don't want to do this", some of them 
are not possible because of other requirements

Re: to_jsonb performance on array aggregated correlated subqueries

От

Justin Pryzby

Дата:

12 августа 2022 г., 21:56:06

What version of postgres ?

I wonder if you're hitting the known memory leak involving jit.
Try with jit=off or jit_inline_above_cost=-1.

-- 
Justin

Re: to_jsonb performance on array aggregated correlated subqueries

От

Nico Heller

Дата:

12 августа 2022 г., 22:02:36

I knew I forgot something: We are currently on 13.6. When was this issue fixed?

Am 12.08.2022 um 20:56 schrieb Justin Pryzby:

What version of postgres ?

I wonder if you're hitting the known memory leak involving jit.
Try with jit=off or jit_inline_above_cost=-1.

Good day,



consider the following query:



WITH aggregation(

    SELECT

           a.*,

          (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",

          (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",

          (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",

          (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"

    FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)

)

SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;



Imagine that for each "a" there exists between 5-100 "b", "c", "d" and 
"e" which makes the result of this pretty big (worst case: around 300kb 
when saved to a text file).

I noticed that adding the "to_jsonb" increases the query time by 100%, 
from 9-10ms to 17-23ms on average.

This may not seem slow at all but this query has another issue: on an 
AWS Aurora Serverless V2 instance we are running into a RAM usage of 
around 30-50 GB compared to < 10 GB when using a simple LEFT JOINed 
query when under high load (> 1000 queries / sec). Furthermore the CPU 
usage is quite high.



Is there anything I could improve? I am open for other solutions but I 
am wondering if I ran into an edge case of "to_jsonb" for "anonymous 
records" (these are just rows without a defined UDT) - this is just a 
wild guess though.

I am mostly looking to decrease the load (CPU and memory) on Postgres 
itself. Furthermore I would like to know why the memory usage is so 
significant. Any tips on how to analyze this issue are appreciated as 
well -  my knowledge is limited to being average at interpreting EXPLAIN 
ANALYZE results.



Here's a succinct list of the why's, what I have found out so far and 
solution I already tried/ don't want to consider:



- LEFT JOINing potentially creates a huge resultset because of the 
cartesian product, thats a nono

- not using "to_jsonb" is sadly also not possible as Postgres' array + 
record syntax is very unfriendly and hard to parse (it's barely 
documented if at all and the quoting rules are cumbersome, furthermore I 
lack column names in the array which would make the parsing sensitive to 
future table changes and thus cumbersome to maintain) in my application

- I know I could solve this with a separate query for a,b,c,d and e 
while "joinining" the result in my application, but I am looking for 
another way to do this (bear with me, treat this as an academic question :))

- I am using "to_jsonb" to simply map the result to my data model via a 
json mapper

- EXPLAIN ANALYZE is not showing anything special when using "to_jsonb" 
vs. not using it, the outermost (hash) join just takes more time - is 
there a more granular EXPLAIN that shows me the runtime of functions 
like "to_jsonb"?

- I tried an approach where b,c,d,e where array columns of UDTs: UDTs 
are not well supported by my application stack (JDBC) and are generally 
undesireable for me (because of a lack of migration possibilities)

- I don't want to duplicate my data into another table (e.g. that has 
jsonb columns)

- MATERIALIZED VIEWS are also undesirable as the manual update, its 
update is non-incremental which would make a refresh on a big data set 
take a long time

- split the query into chunks to reduce the IN()-statement list size 
makes no measurable difference

- I don't want to use JSONB columns for b,c,d and e because future 
changes of b,c,d or e's structure (e.g. new fields, changing a datatype) 
are harder to achieve with JSONB and it lacks constraint checks on 
insert (e.g. not null on column b.xy)



Kind regards and thank you for your time,

Nico Heller



P.S: Sorry for the long list of "I don't want to do this", some of them 
are not possible because of other requirements

Re: to_jsonb performance on array aggregated correlated subqueries

От

Nico Heller

Дата:

12 августа 2022 г., 22:07:20

Am 12.08.2022 um 21:02 schrieb Rick Otten:

On Fri, Aug 12, 2022 at 2:50 PM Nico Heller <nico.heller@posteo.de> wrote:
Good day,

consider the following query:

WITH aggregation(
     SELECT
            a.*,
           (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
           (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
           (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
           (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
     FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

- You do have an index on `b.a_id` and `c.a_id`, etc... ? You didn't say...

Yes there are indices on all referenced columns of the subselect (they are all primary keys anyway)

- Are you sure it is the `to_jsonb` that is making this query slow?

Yes, EXPLAIN ANALYZE shows a doubling of execution time - I don't have numbers on the memory usage difference though

- Since you are serializing this for easy machine readable consumption outside of the database, does it make a difference if you use `to_json` instead?

Using to_json vs. to_jsonb makes no difference in regards to runtime, I will check if the memory consumption is different on monday - thank you for the idea!

Re: to_jsonb performance on array aggregated correlated subqueries

От

Justin Pryzby

Дата:

12 августа 2022 г., 22:10:53

On Fri, Aug 12, 2022 at 07:02:36PM +0000, Nico Heller wrote:
> I knew I forgot something: We are currently on 13.6. When was this issue
> fixed?

There's a WIP/proposed fix, but the fix is not released.
I asked about your version because jit was disabled by default in v11.
But it's enabled by default in v12.

https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items#Older_bugs_affecting_stable_branches

-- 
Justin

Re: to_jsonb performance on array aggregated correlated subqueries

От

Andres Freund

Дата:

12 августа 2022 г., 22:15:35

Hi,

On 2022-08-12 18:49:58 +0000, Nico Heller wrote:
> WITH aggregation(
>     SELECT
>            a.*,
>           (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
>           (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
>           (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
>           (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
>     FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
> )
> SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

> Imagine that for each "a" there exists between 5-100 "b", "c", "d" and "e"
> which makes the result of this pretty big (worst case: around 300kb when
> saved to a text file).
> I noticed that adding the "to_jsonb" increases the query time by 100%, from
> 9-10ms to 17-23ms on average.

Could we see the explain?

Have you tried using json[b]_agg()?


> This may not seem slow at all but this query has another issue: on an AWS
> Aurora Serverless V2 instance we are running into a RAM usage of around
> 30-50 GB compared to < 10 GB when using a simple LEFT JOINed query when
> under high load (> 1000 queries / sec). Furthermore the CPU usage is quite
> high.

We can't say much about aurora. It's a heavily modified fork of postgres.  Did
you reproduce this with vanilla postgres? And if so, do you have it in a form
that somebody could try out?

Greetings,

Andres Freund

Re: to_jsonb performance on array aggregated correlated subqueries

От

Nico Heller

Дата:

12 августа 2022 г., 22:18:33

Am 12.08.2022 um 21:15 schrieb Rick Otten:

On Fri, Aug 12, 2022 at 3:07 PM Nico Heller <nico.heller@posteo.de> wrote:
Am 12.08.2022 um 21:02 schrieb Rick Otten:

On Fri, Aug 12, 2022 at 2:50 PM Nico Heller <nico.heller@posteo.de> wrote:
Good day,

consider the following query:

WITH aggregation(
     SELECT
            a.*,
           (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
           (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
           (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
           (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
     FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

- You do have an index on `b.a_id` and `c.a_id`, etc... ? You didn't say...
Yes there are indices on all referenced columns of the subselect (they are all primary keys anyway)
- Are you sure it is the `to_jsonb` that is making this query slow?
Yes, EXPLAIN ANALYZE shows a doubling of execution time - I don't have numbers on the memory usage difference though

- Since you are serializing this for easy machine readable consumption outside of the database, does it make a difference if you use `to_json` instead?

Using to_json vs. to_jsonb makes no difference in regards to runtime, I will check if the memory consumption is different on monday - thank you for the idea!

One other thought. Does it help if you convert the arrays to json first before you convert the whole row? ie, add some to_json()'s around the bs, cs, ds, es columns in the CTE. I'm wondering if breaking the json conversions up into smaller pieces will let the outer to_json() have less work to do and overall run faster. You could even separately serialize the elements inside the array too. I wouldn't think it would make a huge difference, you'd be making a bunch of extra to_json calls, but maybe it avoids some large memory structure that would otherwise have to be constructed to serialize all of those objects in all of the arrays all at the same time.

Using jsonb_array_agg and another to_jsonb at the (its still needed to create one value at the end and to include the columns "a.*") worsens the query performance by 100%, I can't speak for the memory usage because I would have to push these changes to preproduction - will try this on monday, thanks.

Re: to_jsonb performance on array aggregated correlated subqueries

От

Nico Heller

Дата:

12 августа 2022 г., 22:48:07

Here are the query plans (I hope my anonymization didn't break them). I ran every query a couple times before copying the plan to avoid timing issues because of disk access.
Ignore the sequential scan on one of the tables, it's very small (will change in the future) so Postgres opts for a faster sequential scan - the other sequential scan is on the IN()-statement which uses a VALUE list in the actual query (using a non-VALUE list makes no difference).
Overall the plan is quite optimal for me and performs really well considering the amount of rows it extracts and converts to json.

Notice how removing to_jsonb improves the query performance significantly (see last query plan) and how the cost is attributed to the hash join.
Using to_jsonb instead of to_jsonb or json_agg instead of jsonb_agg makes no difference in query plan or execution time.

I used random id's so I don't know how how big the result got but it shouldn't matter for the query plan:

array_agg, then to_jsonb (my initially posted query)

Hash Semi Join (cost=5.00..15947.39 rows=200 width=32) (actual time=0.266..18.128 rows=200 loops=1)
" Hash Cond: (a.id = ""*VALUES*"".column1)"
-> Seq Scan on a (cost=0.00..41.02 rows=502 width=422) (actual time=0.013..0.268 rows=502 loops=1)
-> Hash (cost=2.50..2.50 rows=200 width=32) (actual time=0.091..0.092 rows=200 loops=1)
        Buckets: 1024 Batches: 1 Memory Usage: 21kB
"        -> Values Scan on ""*VALUES*"" (cost=0.00..2.50 rows=200 width=32) (actual time=0.001..0.040 rows=200 loops=1)"
SubPlan 1
    -> Aggregate (cost=42.20..42.21 rows=1 width=32) (actual time=0.020..0.020 rows=1 loops=200)
          -> Bitmap Heap Scan on b (cost=4.38..42.17 rows=12 width=156) (actual time=0.012..0.017 rows=12 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=382
                -> Bitmap Index Scan on fk_b_idx (cost=0.00..4.37 rows=12 width=0) (actual time=0.008..0.008 rows=14 loops=200)
                      Index Cond: (a_id = a.id)
SubPlan 2
    -> Aggregate (cost=27.68..27.69 rows=1 width=32) (actual time=0.012..0.012 rows=1 loops=200)
          -> Bitmap Heap Scan on c (cost=4.35..27.66 rows=9 width=98) (actual time=0.009..0.010 rows=5 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=169
                -> Bitmap Index Scan on fk_c_idx (cost=0.00..4.35 rows=9 width=0) (actual time=0.007..0.007 rows=5 loops=200)
                      Index Cond: (a_id = a.id)
SubPlan 3
    -> Aggregate (cost=8.30..8.31 rows=1 width=32) (actual time=0.009..0.010 rows=1 loops=200)
          -> Index Scan using fk_d_idx on d (cost=0.28..8.29 rows=1 width=81) (actual time=0.008..0.008 rows=1 loops=200)
                Index Cond: (a_id = a.id)
SubPlan 4
    -> Aggregate (cost=1.27..1.28 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=200)
          -> Seq Scan on e (cost=0.00..1.26 rows=1 width=76) (actual time=0.004..0.004 rows=0 loops=200)
                Filter: (a_id = a.id)
                Rows Removed by Filter: 21
Planning Time: 0.520 ms
Execution Time: 18.650 ms

jsonb_agg instead of array_agg, then to_jsonb

Hash Semi Join (cost=5.00..15947.39 rows=200 width=32) (actual time=0.338..23.921 rows=200 loops=1)
" Hash Cond: (a.id = ""*VALUES*"".column1)"
-> Seq Scan on a (cost=0.00..41.02 rows=502 width=422) (actual time=0.012..0.244 rows=502 loops=1)
-> Hash (cost=2.50..2.50 rows=200 width=32) (actual time=0.090..0.091 rows=200 loops=1)
        Buckets: 1024 Batches: 1 Memory Usage: 21kB
"        -> Values Scan on ""*VALUES*"" (cost=0.00..2.50 rows=200 width=32) (actual time=0.001..0.040 rows=200 loops=1)"
SubPlan 1
    -> Aggregate (cost=42.20..42.21 rows=1 width=32) (actual time=0.050..0.050 rows=1 loops=200)
          -> Bitmap Heap Scan on b (cost=4.38..42.17 rows=12 width=156) (actual time=0.012..0.018 rows=12 loops=200)
               Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=382
                -> Bitmap Index Scan on fk_b_idx (cost=0.00..4.37 rows=12 width=0) (actual time=0.008..0.008 rows=14 loops=200)
                      Index Cond: (a_id = a.id)
SubPlan 2
    -> Aggregate (cost=27.68..27.69 rows=1 width=32) (actual time=0.028..0.028 rows=1 loops=200)
          -> Bitmap Heap Scan on c (cost=4.35..27.66 rows=9 width=98) (actual time=0.009..0.011 rows=5 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=169
                -> Bitmap Index Scan on fk_c_idx (cost=0.00..4.35 rows=9 width=0) (actual time=0.007..0.007 rows=5 loops=200)
                      Index Cond: (a_id = a.id)
SubPlan 3
    -> Aggregate (cost=8.30..8.31 rows=1 width=32) (actual time=0.014..0.014 rows=1 loops=200)
          -> Index Scan using fk_d_idx on d (cost=0.28..8.29 rows=1 width=81) (actual time=0.008..0.008 rows=1 loops=200)
                Index Cond: (a_id = a.id)
SubPlan 4
    -> Aggregate (cost=1.27..1.28 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=200)
          -> Seq Scan on e (cost=0.00..1.26 rows=1 width=76) (actual time=0.004..0.004 rows=0 loops=200)
                Filter: (a_id = a.id)
                Rows Removed by Filter: 21
Planning Time: 0.513 ms
Execution Time: 24.020 ms

array_agg without to_jsonb at the end

Hash Semi Join (cost=5.00..15946.89 rows=200 width=550) (actual time=0.209..9.784 rows=200 loops=1)
" Hash Cond: (a.id = ""*VALUES*"".column1)"
-> Seq Scan on a (cost=0.00..41.02 rows=502 width=422) (actual time=0.013..0.190 rows=502 loops=1)
-> Hash (cost=2.50..2.50 rows=200 width=32) (actual time=0.079..0.080 rows=200 loops=1)
        Buckets: 1024 Batches: 1 Memory Usage: 21kB
"        -> Values Scan on ""*VALUES*"" (cost=0.00..2.50 rows=200 width=32) (actual time=0.001..0.040 rows=200 loops=1)"
SubPlan 1
    -> Aggregate (cost=42.20..42.21 rows=1 width=32) (actual time=0.019..0.019 rows=1 loops=200)
          -> Bitmap Heap Scan on b (cost=4.38..42.17 rows=12 width=156) (actual time=0.012..0.017 rows=12 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=382
                -> Bitmap Index Scan on fk_b_idx (cost=0.00..4.37 rows=12 width=0) (actual time=0.008..0.008 rows=14 loops=200)
                      Index Cond: (a_id = a.id)
SubPlan 2
    -> Aggregate (cost=27.68..27.69 rows=1 width=32) (actual time=0.012..0.012 rows=1 loops=200)
          -> Bitmap Heap Scan on c (cost=4.35..27.66 rows=9 width=98) (actual time=0.008..0.010 rows=5 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=169
                -> Bitmap Index Scan on fk_c_idx (cost=0.00..4.35 rows=9 width=0) (actual time=0.007..0.007 rows=5 loops=200)
                      Index Cond: (a_id = a.id)
SubPlan 3
    -> Aggregate (cost=8.30..8.31 rows=1 width=32) (actual time=0.009..0.009 rows=1 loops=200)
          -> Index Scan using fk_d_idx on d (cost=0.28..8.29 rows=1 width=81) (actual time=0.008..0.008 rows=1 loops=200)
                Index Cond: (a_id = a.id)
SubPlan 4
    -> Aggregate (cost=1.27..1.28 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=200)
          -> Seq Scan on e (cost=0.00..1.26 rows=1 width=76) (actual time=0.004..0.004 rows=0 loops=200)
                Filter: (a_id = a.id)
                Rows Removed by Filter: 21
Planning Time: 0.496 ms
Execution Time: 9.892 ms

Am 12.08.2022 um 21:15 schrieb Andres Freund:

Hi,

On 2022-08-12 18:49:58 +0000, Nico Heller wrote:

WITH aggregation(
    SELECT
           a.*,
          (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
          (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
          (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
          (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
    FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

Imagine that for each "a" there exists between 5-100 "b", "c", "d" and "e"
which makes the result of this pretty big (worst case: around 300kb when
saved to a text file).
I noticed that adding the "to_jsonb" increases the query time by 100%, from
9-10ms to 17-23ms on average.

Could we see the explain?

Have you tried using json[b]_agg()?

This may not seem slow at all but this query has another issue: on an AWS
Aurora Serverless V2 instance we are running into a RAM usage of around
30-50 GB compared to < 10 GB when using a simple LEFT JOINed query when
under high load (> 1000 queries / sec). Furthermore the CPU usage is quite
high.

We can't say much about aurora. It's a heavily modified fork of postgres.  Did
you reproduce this with vanilla postgres? And if so, do you have it in a form
that somebody could try out?

Greetings,

Andres Freund

Re: to_jsonb performance on array aggregated correlated subqueries

От

Rick Otten

Дата:

12 августа 2022 г., 23:17:02

On Fri, Aug 12, 2022 at 3:02 PM Rick Otten <rottenwindfish@gmail.com> wrote:

On Fri, Aug 12, 2022 at 2:50 PM Nico Heller <nico.heller@posteo.de> wrote:
Good day,

consider the following query:

WITH aggregation(
     SELECT
            a.*,
           (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
           (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
           (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
           (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
     FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

- You do have an index on `b.a_id` and `c.a_id`, etc... ? You didn't say...

- Are you sure it is the `to_jsonb` that is making this query slow?

- Since you are serializing this for easy machine readable consumption outside of the database, does it make a difference if you use `to_json` instead?

To follow up here a little. I ran some quick tests on my database and found that `to_json` is consistently, slightly, faster than `to_jsonb` when you are just serializing the result set for consumption. I feed in some arrays of 1,000,000 elements for testing. While both json serializers are slower than just sending back the result set, it wasn't significant on my machine with simple object types. (3% slower).

Are any of your objects in "b.*", etc, complex data structures or deeper arrays, or gis shapes, or strange data types that might be hard to serialize? I'm wondering if there is something hidden in those ".*" row sets that are particularly problematic and compute intensive to process.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: to_jsonb performance on array aggregated correlated subqueries