Обсуждение: Can we get rid of repeated queries from pg_dump?

Поиск

Список

Период

Сортировка

Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

26 августа 2021 г., 11:44:30

Hi,
I have following case: local pg_dump (v15) connecting to remote
PostgreSQL (v12).

I'm trying to get just schema (pg_dump -s). It's taking very long, which
is kinda OK given that there is long distance and latency, but I got
curious and checked queries that the pg_dump was running (select * from
pg_stat_activity where application_name = 'pg_dump').

And I noticed that many of these queries repeat many times.

The ones that I noticed were:
SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
around the time that
SELECT
    proretset,
    prosrc,
    probin,
    provolatile,
    proisstrict,
    prosecdef,
    lanname,
    proconfig,
    procost,
    prorows,
    pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
    pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
    pg_catalog.pg_get_function_result(p.oid) AS funcresult,
    proleakproof,
    array_to_string(protrftypes, ' ') AS protrftypes,
    proparallel,
    prokind,
    prosupport,
    NULL AS prosqlbody
FROM
    pg_catalog.pg_proc p,
    pg_catalog.pg_language l
WHERE
    p.oid = '60188'::pg_catalog.oid
    AND l.oid = p.prolang

was called too.

It seems that for every function, pg_dump is getting it's data, and then
runs format_type on each parameter/output type? I'm mostly guessing
here, as I didn't read the code.

Wouldn't it be possible to get all type formats at once, and cache them
in pg_dump? Or at the very least reuse already received information?

Unfortunately it seems I can't run pg_dump closer to the db server, and
the latency of queries is killing me.

It's been 15 minutes, and pg_dump (called: pg_dump -v -s -f schema.dump,
with env variables configuring db connection) hasn't written even single
byte to schema.dump)

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

26 августа 2021 г., 17:02:07

hubert depesz lubaczewski <depesz@depesz.com> writes:
> It seems that for every function, pg_dump is getting it's data, and then
> runs format_type on each parameter/output type? I'm mostly guessing
> here, as I didn't read the code.
> Wouldn't it be possible to get all type formats at once, and cache them
> in pg_dump? Or at the very least reuse already received information?

Send a patch ...

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

26 августа 2021 г., 17:08:27

On Thu, Aug 26, 2021 at 10:02:07AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > It seems that for every function, pg_dump is getting it's data, and then
> > runs format_type on each parameter/output type? I'm mostly guessing
> > here, as I didn't read the code.
> > Wouldn't it be possible to get all type formats at once, and cache them
> > in pg_dump? Or at the very least reuse already received information?
> Send a patch ...

Yeah, that's not going to work, my C skills are next-to-none :(

I guess I'll have to wait till someone else will assume it's a problem,
someone with skills to do something about it.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

26 августа 2021 г., 17:20:29

hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Thu, Aug 26, 2021 at 10:02:07AM -0400, Tom Lane wrote:
>> hubert depesz lubaczewski <depesz@depesz.com> writes:
>>> Wouldn't it be possible to get all type formats at once, and cache them
>>> in pg_dump? Or at the very least reuse already received information?

>> Send a patch ...

> Yeah, that's not going to work, my C skills are next-to-none :(
> I guess I'll have to wait till someone else will assume it's a problem,
> someone with skills to do something about it.

Well, you could move it forward by doing the legwork to identify which
queries are worth merging.  Is it really sane to do a global "select
format_type() from pg_type" query and save all the results on the client
side?  I wonder whether there are cases where that'd be a net loss.
You could do the experimentation to figure that out without necessarily
having the C skills to make pg_dump actually do it.

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

26 августа 2021 г., 17:29:53

On Thu, Aug 26, 2021 at 10:20:29AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > On Thu, Aug 26, 2021 at 10:02:07AM -0400, Tom Lane wrote:
> >> hubert depesz lubaczewski <depesz@depesz.com> writes:
> >>> Wouldn't it be possible to get all type formats at once, and cache them
> >>> in pg_dump? Or at the very least reuse already received information?
> 
> >> Send a patch ...
> 
> > Yeah, that's not going to work, my C skills are next-to-none :(
> > I guess I'll have to wait till someone else will assume it's a problem,
> > someone with skills to do something about it.
> 
> Well, you could move it forward by doing the legwork to identify which
> queries are worth merging.  Is it really sane to do a global "select

Sure. On it. Will report back when I'll have more info.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Adrian Klaver

Дата:

26 августа 2021 г., 17:34:26

On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:
> Hi,
> I have following case: local pg_dump (v15) connecting to remote
> PostgreSQL (v12).

So you are using a dev version of pg_dump or is that a typo?

> 
> It's been 15 minutes, and pg_dump (called: pg_dump -v -s -f schema.dump,
> with env variables configuring db connection) hasn't written even single
> byte to schema.dump)

What happens if you run without the -v?

> 
> depesz
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

26 августа 2021 г., 17:35:43

On Thu, Aug 26, 2021 at 07:34:26AM -0700, Adrian Klaver wrote:
> On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:
> > Hi,
> > I have following case: local pg_dump (v15) connecting to remote
> > PostgreSQL (v12).
> So you are using a dev version of pg_dump or is that a typo?

Yes. I'm running pg_dump from my computer to (very) remote db server.

> > It's been 15 minutes, and pg_dump (called: pg_dump -v -s -f schema.dump,
> > with env variables configuring db connection) hasn't written even single
> > byte to schema.dump)
> What happens if you run without the -v?

Well, I guess it works, but with no output I can't judge how fast.
Definitely doesn't seem to be going any faster.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Adrian Klaver

Дата:

26 августа 2021 г., 17:46:46

On 8/26/21 7:35 AM, hubert depesz lubaczewski wrote:
> On Thu, Aug 26, 2021 at 07:34:26AM -0700, Adrian Klaver wrote:
>> On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:
>>> Hi,
>>> I have following case: local pg_dump (v15) connecting to remote
>>> PostgreSQL (v12).
>> So you are using a dev version of pg_dump or is that a typo?
> 
> Yes. I'm running pg_dump from my computer to (very) remote db server.

SSHing and dumping on the remote is out as a short term solution?

> Well, I guess it works, but with no output I can't judge how fast.
> Definitely doesn't seem to be going any faster.

Unknown slow, that didn't help.

> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

26 августа 2021 г., 17:48:26

On Thu, Aug 26, 2021 at 07:46:46AM -0700, Adrian Klaver wrote:
> On 8/26/21 7:35 AM, hubert depesz lubaczewski wrote:
> > On Thu, Aug 26, 2021 at 07:34:26AM -0700, Adrian Klaver wrote:
> > > On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:
> > > > Hi,
> > > > I have following case: local pg_dump (v15) connecting to remote
> > > > PostgreSQL (v12).
> > > So you are using a dev version of pg_dump or is that a typo?
> > 
> > Yes. I'm running pg_dump from my computer to (very) remote db server.
> SSHing and dumping on the remote is out as a short term solution?

As I mentioned in original post - I can't run pg_dump closer to server.
SSH is not available, at least for me.

Anyway - I got the dump, so I am good for now, but I think that this
could be improved, so I'll work on getting some stats on queries.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

26 августа 2021 г., 19:06:44

On Thu, Aug 26, 2021 at 10:20:29AM -0400, Tom Lane wrote:
> Well, you could move it forward by doing the legwork to identify which
> queries are worth merging.  Is it really sane to do a global "select
> format_type() from pg_type" query and save all the results on the client
> side?  I wonder whether there are cases where that'd be a net loss.
> You could do the experimentation to figure that out without necessarily
> having the C skills to make pg_dump actually do it.

So, I got some info.

First, some stats. The DB contains:

- 14 extensions
- 1 aggregate
- 107 functions
- 5 schemas
- 5 sequences
- 188 logged tables
- 1 unlogged table
- 206 "normal" indexes
- 30 unique indexes
- 15 materialized views
- 16 triggers
- 87 types
- 26 views

pg_dump -s of it is ~ 670kB.

Interestingly, while dumping (pg_dump -s -v), we can see progress going on, and then, after:

====
...
pg_dump: reading publications
pg_dump: reading publication membership
pg_dump: reading subscriptions
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
pg_dump: saving search_path = 
====

It stops (progress visible in console). And then, in pg logs I see queries like:

#v+
SELECT
    proretset,
    prosrc,
    probin,
    provolatile,
    proisstrict,
    prosecdef,
    lanname,
    proconfig,
    procost,
    prorows,
    pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
    pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
    pg_catalog.pg_get_function_result(p.oid) AS funcresult,
    proleakproof,
    array_to_string(protrftypes, ' ') AS protrftypes,
    proparallel,
    prokind,
    prosupport,
    NULL AS prosqlbody
    FROM pg_catalog.pg_proc p, pg_catalog.pg_language l
    WHERE p.oid = '43875'::pg_catalog.oid AND l.oid = p.prolang
#v-

Now for query stats.

To dump it all, pg_dump needed 9173 queries (logged by
log_min_duration_statement = 0 for this user).

I extracted all queries to separate files, and made stats. In total there were
only 4257 unique queries.

Then I checked for repeated queries. Top 10 most repeated offenders were:

615 times : SELECT pg_catalog.format_type('25'::pg_catalog.oid, NULL)
599 times : SELECT pg_catalog.format_type('23'::pg_catalog.oid, NULL)
579 times : SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
578 times : SELECT pg_catalog.format_type('41946'::pg_catalog.oid, NULL)
523 times : SELECT pg_catalog.format_type('701'::pg_catalog.oid, NULL)
459 times : SELECT pg_catalog.format_type('42923'::pg_catalog.oid, NULL)
258 times : SELECT pg_catalog.format_type('16'::pg_catalog.oid, NULL)
176 times : SELECT pg_catalog.format_type('19'::pg_catalog.oid, NULL)
110 times : SELECT pg_catalog.format_type('21'::pg_catalog.oid, NULL)
106 times : SELECT pg_catalog.format_type('42604'::pg_catalog.oid, NULL)

In total, there were 5000 queries:
SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)

But there were only 83 separate oids that were scanned.

The only other repeated command was:
SELECT pg_catalog.set_config('search_path', '', false);
and it was called only twice.

Based on my reading of queries in order it seems to follow the pattern of:

One call for:

SELECT  proretset,  prosrc,  probin,  provolatile,  proisstrict,  prosecdef,  lanname,  proconfig,  procost,  prorows,
pg_catalog.pg_get_function_arguments(p.oid)AS funcargs,  pg_catalog.pg_get_function_identity_arguments(p.oid) AS
funciargs, pg_catalog.pg_get_function_re
 
sult(p.oid) AS funcresult,  proleakproof,  array_to_string(protrftypes, ' ') AS protrftypes,  proparallel,  prokind,
prosupport, NULL AS prosqlbody  FROM pg_catalog.pg_proc p, pg_catalog.pg_language l  WHERE p.oid =
'SOME_NUMBER'::pg_catalog.oidAND l.oid = p.prolang 
 

and then one or more:

SELECT pg_catalog.format_type('SOME_NUMBER'::pg_catalog.oid, NULL)

In one case, after proc query, there were 94 concecutive
pg_catalog.format_type queries.

I hope it helps.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Laurenz Albe

Дата:

27 августа 2021 г., 10:33:51

On Thu, 2021-08-26 at 18:06 +0200, hubert depesz lubaczewski wrote:
> Now for query stats.
> 
> To dump it all, pg_dump needed 9173 queries (logged by
> log_min_duration_statement = 0 for this user).
> 
> I extracted all queries to separate files, and made stats. In total there were
> only 4257 unique queries.
> 
> Then I checked for repeated queries. Top 10 most repeated offenders were:
> 
> 615 times : SELECT pg_catalog.format_type('25'::pg_catalog.oid, NULL)
> 599 times : SELECT pg_catalog.format_type('23'::pg_catalog.oid, NULL)
> 579 times : SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
> 578 times : SELECT pg_catalog.format_type('41946'::pg_catalog.oid, NULL)
> 523 times : SELECT pg_catalog.format_type('701'::pg_catalog.oid, NULL)
> 459 times : SELECT pg_catalog.format_type('42923'::pg_catalog.oid, NULL)
> 258 times : SELECT pg_catalog.format_type('16'::pg_catalog.oid, NULL)
> 176 times : SELECT pg_catalog.format_type('19'::pg_catalog.oid, NULL)
> 110 times : SELECT pg_catalog.format_type('21'::pg_catalog.oid, NULL)
> 106 times : SELECT pg_catalog.format_type('42604'::pg_catalog.oid, NULL)
> 
> In total, there were 5000 queries:
> SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)
> 
> But there were only 83 separate oids that were scanned.

That is a strong argument for using a hash table to cache the types.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

28 августа 2021 г., 00:23:23

Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Thu, 2021-08-26 at 18:06 +0200, hubert depesz lubaczewski wrote:
>> In total, there were 5000 queries:
>> SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)
>> But there were only 83 separate oids that were scanned.

> That is a strong argument for using a hash table to cache the types.

Those queries are coming from getFormattedTypeName(), which is used
for function arguments and the like.  I'm not quite sure why Hubert
is seeing 5000 such calls in a database with only ~100 functions;
surely they don't all have an average of 50 arguments?

I experimented with the attached, very quick-n-dirty patch to collect
format_type results during the initial scan of pg_type, instead.  On the
regression database in HEAD, it reduces the number of queries pg_dump
issues from 3260 to 2905; but I'm having a hard time detecting any net
performance change.

(This is not meant for commit as-is; notably, I didn't bother to fix
getTypes' code paths for pre-9.6 servers.  It should be fine for
performance testing though.)

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6adbd20778..efe1be533f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5088,6 +5088,7 @@ getTypes(Archive *fout, int *numTypes)
     int            i_oid;
     int            i_typname;
     int            i_typnamespace;
+    int            i_ftypname;
     int            i_typacl;
     int            i_rtypacl;
     int            i_inittypacl;
@@ -5129,6 +5130,7 @@ getTypes(Archive *fout, int *numTypes)

         appendPQExpBuffer(query, "SELECT t.tableoid, t.oid, t.typname, "
                           "t.typnamespace, "
+                          "pg_catalog.format_type(t.oid, NULL) AS ftypname, "
                           "%s AS typacl, "
                           "%s AS rtypacl, "
                           "%s AS inittypacl, "
@@ -5211,6 +5213,7 @@ getTypes(Archive *fout, int *numTypes)
     i_oid = PQfnumber(res, "oid");
     i_typname = PQfnumber(res, "typname");
     i_typnamespace = PQfnumber(res, "typnamespace");
+    i_ftypname = PQfnumber(res, "ftypname");
     i_typacl = PQfnumber(res, "typacl");
     i_rtypacl = PQfnumber(res, "rtypacl");
     i_inittypacl = PQfnumber(res, "inittypacl");
@@ -5232,6 +5235,7 @@ getTypes(Archive *fout, int *numTypes)
         tyinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_typname));
         tyinfo[i].dobj.namespace =
             findNamespace(atooid(PQgetvalue(res, i, i_typnamespace)));
+        tyinfo[i].ftypname = pg_strdup(PQgetvalue(res, i, i_ftypname));
         tyinfo[i].rolname = pg_strdup(PQgetvalue(res, i, i_rolname));
         tyinfo[i].typacl = pg_strdup(PQgetvalue(res, i, i_typacl));
         tyinfo[i].rtypacl = pg_strdup(PQgetvalue(res, i, i_rtypacl));
@@ -18892,12 +18896,11 @@ findDumpableDependencies(ArchiveHandle *AH, const DumpableObject *dobj,
  *
  * This does not guarantee to schema-qualify the output, so it should not
  * be used to create the target object name for CREATE or ALTER commands.
- *
- * TODO: there might be some value in caching the results.
  */
 static char *
 getFormattedTypeName(Archive *fout, Oid oid, OidOptions opts)
 {
+    TypeInfo   *typeInfo;
     char       *result;
     PQExpBuffer query;
     PGresult   *res;
@@ -18910,6 +18913,16 @@ getFormattedTypeName(Archive *fout, Oid oid, OidOptions opts)
             return pg_strdup("NONE");
     }

+    typeInfo = findTypeByOid(oid);
+    if (typeInfo)
+        return pg_strdup(typeInfo->ftypname);
+
+    /*
+     * Fall back to asking the server.  XXX we could only reach this code if
+     * getTypes() didn't see the type, which should never happen.  If it did
+     * happen, it'd imply that we don't have proper dependencies for the
+     * object we're trying to describe.  Maybe just throw an error instead?
+     */
     query = createPQExpBuffer();
     appendPQExpBuffer(query, "SELECT pg_catalog.format_type('%u'::pg_catalog.oid, NULL)",
                       oid);
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index f5e170e0db..29af845ece 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -166,9 +166,11 @@ typedef struct _typeInfo
     DumpableObject dobj;

     /*
-     * Note: dobj.name is the pg_type.typname entry.  format_type() might
-     * produce something different than typname
+     * Note: dobj.name is the raw pg_type.typname entry.  ftypname is the
+     * result of format_type(), which will be quoted if needed, and might be
+     * schema-qualified too.
      */
+    char       *ftypname;
     char       *rolname;        /* name of owner, or empty string */
     char       *typacl;
     char       *rtypacl;

Re: Can we get rid of repeated queries from pg_dump?

От

Adrian Klaver

Дата:

28 августа 2021 г., 00:53:01

On 8/27/21 2:23 PM, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
>> On Thu, 2021-08-26 at 18:06 +0200, hubert depesz lubaczewski wrote:
>>> In total, there were 5000 queries:
>>> SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)
>>> But there were only 83 separate oids that were scanned.
> 
>> That is a strong argument for using a hash table to cache the types.
> 
> Those queries are coming from getFormattedTypeName(), which is used
> for function arguments and the like.  I'm not quite sure why Hubert
> is seeing 5000 such calls in a database with only ~100 functions;
> surely they don't all have an average of 50 arguments?

Could be.

 From the stats post:

"Based on my reading of queries in order it seems to follow the pattern of:

One call for:

SELECT  proretset,  prosrc,  probin,  provolatile,  proisstrict, 
prosecdef,  lanname,  proconfig,  procost,  prorows, 
pg_catalog.pg_get_function_arguments(p.oid) AS funcargs, 
pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs, 
pg_catalog.pg_get_function_re
sult(p.oid) AS funcresult,  proleakproof,  array_to_string(protrftypes, 
' ') AS protrftypes,  proparallel,  prokind,  prosupport,  NULL AS 
prosqlbody  FROM pg_catalog.pg_proc p, pg_catalog.pg_language l  WHERE 
p.oid = 'SOME_NUMBER'::pg_catalog.oid AND l.oid = p.prolang

and then one or more:

SELECT pg_catalog.format_type('SOME_NUMBER'::pg_catalog.oid, NULL)


In one case, after proc query, there were 94 concecutive
pg_catalog.format_type queries.
"


> 
> I experimented with the attached, very quick-n-dirty patch to collect
> format_type results during the initial scan of pg_type, instead.  On the
> regression database in HEAD, it reduces the number of queries pg_dump
> issues from 3260 to 2905; but I'm having a hard time detecting any net
> performance change.
> 
> (This is not meant for commit as-is; notably, I didn't bother to fix
> getTypes' code paths for pre-9.6 servers.  It should be fine for
> performance testing though.)
> 
>             regards, tom lane
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Can we get rid of repeated queries from pg_dump?

От

Stephen Frost

Дата:

28 августа 2021 г., 00:58:36

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > On Thu, 2021-08-26 at 18:06 +0200, hubert depesz lubaczewski wrote:
> >> In total, there were 5000 queries:
> >> SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)
> >> But there were only 83 separate oids that were scanned.
>
> > That is a strong argument for using a hash table to cache the types.
>
> Those queries are coming from getFormattedTypeName(), which is used
> for function arguments and the like.  I'm not quite sure why Hubert
> is seeing 5000 such calls in a database with only ~100 functions;
> surely they don't all have an average of 50 arguments?
>
> I experimented with the attached, very quick-n-dirty patch to collect
> format_type results during the initial scan of pg_type, instead.  On the
> regression database in HEAD, it reduces the number of queries pg_dump
> issues from 3260 to 2905; but I'm having a hard time detecting any net
> performance change.

Seems like the issue here is mainly just the latency of each query being
rather high compared to most use-cases, so local testing where there's
basically zero latency wouldn't see any change in timing, but throw a
trans-atlantic or worse amount of latency between the system running
pg_dump and the PG server and you'd see notable wall-clock savings in
time.

Only took a quick look but generally +1 on reducing the number of
queries that pg_dump is doing and the changes suggested looked good to
me.

Thanks,

Stephen

Вложения

signature.asc

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

28 августа 2021 г., 01:25:05

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I experimented with the attached, very quick-n-dirty patch to collect
>> format_type results during the initial scan of pg_type, instead.  On the
>> regression database in HEAD, it reduces the number of queries pg_dump
>> issues from 3260 to 2905; but I'm having a hard time detecting any net
>> performance change.

> Seems like the issue here is mainly just the latency of each query being
> rather high compared to most use-cases, so local testing where there's
> basically zero latency wouldn't see any change in timing, but throw a
> trans-atlantic or worse amount of latency between the system running
> pg_dump and the PG server and you'd see notable wall-clock savings in
> time.

Yeah.  What I was more concerned about was the potential downside
of running format_type() for each pg_type row, even though we might
use only a few of those results.  The fact that I'm *not* seeing
a performance hit with a local server is encouraging from that
standpoint.

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Stephen Frost

Дата:

28 августа 2021 г., 01:27:20

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> I experimented with the attached, very quick-n-dirty patch to collect
> >> format_type results during the initial scan of pg_type, instead.  On the
> >> regression database in HEAD, it reduces the number of queries pg_dump
> >> issues from 3260 to 2905; but I'm having a hard time detecting any net
> >> performance change.
>
> > Seems like the issue here is mainly just the latency of each query being
> > rather high compared to most use-cases, so local testing where there's
> > basically zero latency wouldn't see any change in timing, but throw a
> > trans-atlantic or worse amount of latency between the system running
> > pg_dump and the PG server and you'd see notable wall-clock savings in
> > time.
>
> Yeah.  What I was more concerned about was the potential downside
> of running format_type() for each pg_type row, even though we might
> use only a few of those results.  The fact that I'm *not* seeing
> a performance hit with a local server is encouraging from that
> standpoint.

Ah, yes, agreed.

Thanks!

Stephen

Вложения

signature.asc

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

28 августа 2021 г., 01:51:02

Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 8/27/21 2:23 PM, Tom Lane wrote:
>> Those queries are coming from getFormattedTypeName(), which is used
>> for function arguments and the like.  I'm not quite sure why Hubert
>> is seeing 5000 such calls in a database with only ~100 functions;
>> surely they don't all have an average of 50 arguments?

> Could be.

Maybe.  I'm disturbed by the discrepancy between my result (about
10% of pg_dump's queries are these) and Hubert's (over 50% are).
I'd like to know the reason for that before we push forward.

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

28 августа 2021 г., 09:38:24

On Fri, Aug 27, 2021 at 05:23:23PM -0400, Tom Lane wrote:
> I experimented with the attached, very quick-n-dirty patch to collect
> format_type results during the initial scan of pg_type, instead.  On the
> regression database in HEAD, it reduces the number of queries pg_dump
> issues from 3260 to 2905; but I'm having a hard time detecting any net
> performance change.
> (This is not meant for commit as-is; notably, I didn't bother to fix
> getTypes' code paths for pre-9.6 servers.  It should be fine for
> performance testing though.)

Hi,
thanks a lot for this. Will test and report back, most likely on Monday,
though.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

28 августа 2021 г., 09:46:50

On Fri, Aug 27, 2021 at 05:23:23PM -0400, Tom Lane wrote:
> Those queries are coming from getFormattedTypeName(), which is used
> for function arguments and the like.  I'm not quite sure why Hubert
> is seeing 5000 such calls in a database with only ~100 functions;
> surely they don't all have an average of 50 arguments?

Oh. missed that part.
So I checked. In the mean time I got -Fc dump, so:

#v+
=$ pg_restore -l schema.dump | \
    grep -P '^\d*; \d+ \d+ FUNCTION ' |
    sed 's/^[^(]*(//; s/)[^)]*$//' |
    awk -F, '{print NF}' |
    sort -n |
    uniq -c
23 0
52 1
21 2
 8 3
 1 4
 2 5
#v-

23 functions with 0 arguments, 52 with 1, and the max is 5 arguments - two
functions have these.

Not sure if it matters but there is a lot of enums. 83 of them. And they have
up to 250 elements (2 such types).

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

28 августа 2021 г., 17:28:50

hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Fri, Aug 27, 2021 at 05:23:23PM -0400, Tom Lane wrote:
>> Those queries are coming from getFormattedTypeName(), which is used
>> for function arguments and the like.  I'm not quite sure why Hubert
>> is seeing 5000 such calls in a database with only ~100 functions;
>> surely they don't all have an average of 50 arguments?

> 23 functions with 0 arguments, 52 with 1, and the max is 5 arguments - two
> functions have these.
> Not sure if it matters but there is a lot of enums. 83 of them. And they have
> up to 250 elements (2 such types).

Hmm, no, I don't see any getFormattedTypeName calls in dumpEnumType.

There are two of 'em in dumpCast though.  Does this DB by chance
have a ton of user-defined casts?

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

29 августа 2021 г., 01:26:21

Here is a second patch, quite independent of the first one, that
gets rid of some other repetitive queries.  On the regression database,
the number of queries needed to do "pg_dump -s regression" drops from
3260 to 2589, and on my machine it takes 1.8 sec instead of 2.1 sec.

What's attacked here is a fairly silly decision in getPolicies()
to query pg_policy once per table, when we could do so just once.
It might have been okay if we skipped the per-table query for
tables that lack policies, but it's not clear to me that we can
know that without looking into pg_policy.  In any case I doubt
this is ever going to be less efficient than the original coding.

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6adbd20778..befe68de1a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3656,7 +3656,7 @@ dumpBlobs(Archive *fout, const void *arg)

 /*
  * getPolicies
- *      get information about policies on a dumpable table.
+ *      get information about all RLS policies on dumpable tables.
  */
 void
 getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
@@ -3666,6 +3666,7 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
     PolicyInfo *polinfo;
     int            i_oid;
     int            i_tableoid;
+    int            i_polrelid;
     int            i_polname;
     int            i_polcmd;
     int            i_polpermissive;
@@ -3681,6 +3682,10 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)

     query = createPQExpBuffer();

+    /*
+     * First, check which tables have RLS enabled.  We represent RLS being
+     * enabled on a table by creating a PolicyInfo object with null polname.
+     */
     for (i = 0; i < numTables; i++)
     {
         TableInfo  *tbinfo = &tblinfo[i];
@@ -3689,15 +3694,6 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
         if (!(tbinfo->dobj.dump & DUMP_COMPONENT_POLICY))
             continue;

-        pg_log_info("reading row security enabled for table \"%s.%s\"",
-                    tbinfo->dobj.namespace->dobj.name,
-                    tbinfo->dobj.name);
-
-        /*
-         * Get row security enabled information for the table. We represent
-         * RLS being enabled on a table by creating a PolicyInfo object with
-         * null polname.
-         */
         if (tbinfo->rowsec)
         {
             /*
@@ -3719,51 +3715,35 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
             polinfo->polqual = NULL;
             polinfo->polwithcheck = NULL;
         }
+    }

-        pg_log_info("reading policies for table \"%s.%s\"",
-                    tbinfo->dobj.namespace->dobj.name,
-                    tbinfo->dobj.name);
-
-        resetPQExpBuffer(query);
-
-        /* Get the policies for the table. */
-        if (fout->remoteVersion >= 100000)
-            appendPQExpBuffer(query,
-                              "SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, "
-                              "CASE WHEN pol.polroles = '{0}' THEN NULL ELSE "
-                              "   pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(rolname) from
pg_catalog.pg_rolesWHERE oid = ANY(pol.polroles)), ', ') END AS polroles, " 
-                              "pg_catalog.pg_get_expr(pol.polqual, pol.polrelid) AS polqual, "
-                              "pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid) AS polwithcheck "
-                              "FROM pg_catalog.pg_policy pol "
-                              "WHERE polrelid = '%u'",
-                              tbinfo->dobj.catId.oid);
-        else
-            appendPQExpBuffer(query,
-                              "SELECT oid, tableoid, pol.polname, pol.polcmd, 't' as polpermissive, "
-                              "CASE WHEN pol.polroles = '{0}' THEN NULL ELSE "
-                              "   pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(rolname) from
pg_catalog.pg_rolesWHERE oid = ANY(pol.polroles)), ', ') END AS polroles, " 
-                              "pg_catalog.pg_get_expr(pol.polqual, pol.polrelid) AS polqual, "
-                              "pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid) AS polwithcheck "
-                              "FROM pg_catalog.pg_policy pol "
-                              "WHERE polrelid = '%u'",
-                              tbinfo->dobj.catId.oid);
-        res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+    /*
+     * Now, read all RLS policies, and create PolicyInfo objects for all those
+     * that are of interest.
+     */
+    pg_log_info("reading row-level security policies");

-        ntups = PQntuples(res);
+    printfPQExpBuffer(query,
+                      "SELECT oid, tableoid, pol.polrelid, pol.polname, pol.polcmd, ");
+    if (fout->remoteVersion >= 100000)
+        appendPQExpBuffer(query, "pol.polpermissive, ");
+    else
+        appendPQExpBuffer(query, "'t' as polpermissive, ");
+    appendPQExpBuffer(query,
+                      "CASE WHEN pol.polroles = '{0}' THEN NULL ELSE "
+                      "   pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(rolname) from
pg_catalog.pg_rolesWHERE oid = ANY(pol.polroles)), ', ') END AS polroles, " 
+                      "pg_catalog.pg_get_expr(pol.polqual, pol.polrelid) AS polqual, "
+                      "pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid) AS polwithcheck "
+                      "FROM pg_catalog.pg_policy pol");

-        if (ntups == 0)
-        {
-            /*
-             * No explicit policies to handle (only the default-deny policy,
-             * which is handled as part of the table definition).  Clean up
-             * and return.
-             */
-            PQclear(res);
-            continue;
-        }
+    res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);

+    ntups = PQntuples(res);
+    if (ntups > 0)
+    {
         i_oid = PQfnumber(res, "oid");
         i_tableoid = PQfnumber(res, "tableoid");
+        i_polrelid = PQfnumber(res, "polrelid");
         i_polname = PQfnumber(res, "polname");
         i_polcmd = PQfnumber(res, "polcmd");
         i_polpermissive = PQfnumber(res, "polpermissive");
@@ -3775,6 +3755,16 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)

         for (j = 0; j < ntups; j++)
         {
+            Oid            polrelid = atooid(PQgetvalue(res, j, i_polrelid));
+            TableInfo  *tbinfo = findTableByOid(polrelid);
+
+            /*
+             * Ignore row security on tables not to be dumped.  (This will
+             * result in some harmless wasted slots in polinfo[].)
+             */
+            if (!(tbinfo->dobj.dump & DUMP_COMPONENT_POLICY))
+                continue;
+
             polinfo[j].dobj.objType = DO_POLICY;
             polinfo[j].dobj.catId.tableoid =
                 atooid(PQgetvalue(res, j, i_tableoid));
@@ -3804,8 +3794,10 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
                 polinfo[j].polwithcheck
                     = pg_strdup(PQgetvalue(res, j, i_polwithcheck));
         }
-        PQclear(res);
     }
+
+    PQclear(res);
+
     destroyPQExpBuffer(query);
 }

Re: Can we get rid of repeated queries from pg_dump?

От

Gus Spier

Дата:

29 августа 2021 г., 14:35:21

You guys are brilliant!

Regards,

Gus

On Sat, Aug 28, 2021 at 6:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here is a second patch, quite independent of the first one, that
gets rid of some other repetitive queries. On the regression database,
the number of queries needed to do "pg_dump -s regression" drops from
3260 to 2589, and on my machine it takes 1.8 sec instead of 2.1 sec.

What's attacked here is a fairly silly decision in getPolicies()
to query pg_policy once per table, when we could do so just once.
It might have been okay if we skipped the per-table query for
tables that lack policies, but it's not clear to me that we can
know that without looking into pg_policy. In any case I doubt
this is ever going to be less efficient than the original coding.

regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Alvaro Herrera

Дата:

29 августа 2021 г., 16:13:15

On 2021-Aug-28, Tom Lane wrote:

> Here is a second patch, quite independent of the first one, that
> gets rid of some other repetitive queries.

Another pointlessly repetitive query is in getTriggers, which we run
once per table to be dumped containing triggers.  We could reduce that
by running it in bulk for many relations at a time.  I suppose it's
normally not hurtful, but as we grow the number of partitions we allow
it's going to become a problem.

No patch from me for now — if someone wantw to volunteer one, it looks
simple enough ...

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

29 августа 2021 г., 16:51:58

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Another pointlessly repetitive query is in getTriggers, which we run
> once per table to be dumped containing triggers.  We could reduce that
> by running it in bulk for many relations at a time.  I suppose it's
> normally not hurtful, but as we grow the number of partitions we allow
> it's going to become a problem.

Perhaps.  In the regression database, only ~10% of the tables have
triggers, so it's likely not going to yield any measurable gain there.
But databases that make heavier use of foreign keys might see a win.

Another thing I've wondered about before is whether it could make sense
to read pg_attribute once rather than once per table.  There might be
a fair amount of wasted work if the dump is selective, and in big DBs
the sheer size of that result could be a problem.  But those reads are
definitely way up there on the number-of-queries scale.

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Stephen Frost

Дата:

29 августа 2021 г., 22:47:11

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Another pointlessly repetitive query is in getTriggers, which we run
> > once per table to be dumped containing triggers.  We could reduce that
> > by running it in bulk for many relations at a time.  I suppose it's
> > normally not hurtful, but as we grow the number of partitions we allow
> > it's going to become a problem.
>
> Perhaps.  In the regression database, only ~10% of the tables have
> triggers, so it's likely not going to yield any measurable gain there.
> But databases that make heavier use of foreign keys might see a win.

It sure seems like in just about all cases fewer queries is going to be
better.

> Another thing I've wondered about before is whether it could make sense
> to read pg_attribute once rather than once per table.  There might be
> a fair amount of wasted work if the dump is selective, and in big DBs
> the sheer size of that result could be a problem.  But those reads are
> definitely way up there on the number-of-queries scale.

Yeah, I've thought about this before too.  Would sure be nice if there
was a way that we could query the catalog selectively based on the
options the user has passed in but do so in as few queries as possible.

Thanks,

Stephen

Вложения

signature.asc

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

30 августа 2021 г., 10:44:43

On Fri, Aug 27, 2021 at 05:23:23PM -0400, Tom Lane wrote:
> I experimented with the attached, very quick-n-dirty patch to collect
> format_type results during the initial scan of pg_type, instead.  On the
> regression database in HEAD, it reduces the number of queries pg_dump
> issues from 3260 to 2905; but I'm having a hard time detecting any net
> performance change.

Hi,
So, I applied it to brand new HEAD from git, Result:

From total of 9173 queries it went down to 4178.
Originally 5000 type queries, now 19!
This is actually strange given that previously it was asking querying
about 83 separate type oids. But, as far as I was able to check with
"pg_restore -l" (from -Fc dump), results are the same.

Dump time down from 17m 22s to 8m 12s.

Then, I applied the patch from
https://www.postgresql.org/message-id/1082810.1630189581%40sss.pgh.pa.us

without removing first one, as you said they are quite independent.

With both patches applied I got 3884 queries total, and dump from
original db in 7m 35s.

So this clearly helps. A LOT.

But since we're looking at it, and with both patches applied, I looked
at the next most common query. Which is:

#v+
SELECT
    proretset,
    prosrc,
    probin,
    provolatile,
    proisstrict,
    prosecdef,
    lanname,
    proconfig,
    procost,
    prorows,
    pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
    pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
    pg_catalog.pg_get_function_result(p.oid) AS funcresult,
    proleakproof,
    array_to_string(protrftypes, ' ') AS protrftypes,
    proparallel,
    prokind,
    prosupport,
    NULL AS prosqlbody
    FROM pg_catalog.pg_proc p, pg_catalog.pg_language l
    WHERE p.oid = '25491'::pg_catalog.oid AND l.oid = p.prolang
#v-

From the 3884 in the current pg_dump (with both patches applied) - these
queries were called 1804 times. All of these calls where with different oids,
so it's possible that there is nothing to be done about it, but figured I'll
let you know.

The thing is - even though it was called 1804 times, dump contains data only
about 107 functions (pg_restore -l schema.dump | grep -c FUNCTION), so it kinda
seems that 94% of these calls is not needed.

Anyway, even if we can't get any help for function queries, improvement of over
50% is great.

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

30 августа 2021 г., 17:11:22

hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Fri, Aug 27, 2021 at 05:23:23PM -0400, Tom Lane wrote:
>> I experimented with the attached, very quick-n-dirty patch to collect
>> format_type results during the initial scan of pg_type, instead.

> So, I applied it to brand new HEAD from git, Result:
> From total of 9173 queries it went down to 4178.
> Originally 5000 type queries, now 19!
> This is actually strange given that previously it was asking querying
> about 83 separate type oids. But, as far as I was able to check with
> "pg_restore -l" (from -Fc dump), results are the same.

Hm.  So we're still no wiser than before about how such a small (in
terms of number of objects) database could have produced so many
getFormattedTypeName calls.  Plus, this result raises a new question:
with the patch, I think you shouldn't have seen *any* queries of that
form.  Where are the 19 survivors coming from?

I don't suppose you could send me a schema-only dump of that
database, off-list?  I'm now quite curious.

> But since we're looking at it, and with both patches applied, I looked
> at the next most common query. Which is:
> [ collection of details about a function ]

> The thing is - even though it was called 1804 times, dump contains data only
> about 107 functions (pg_restore -l schema.dump | grep -c FUNCTION), so it kinda
> seems that 94% of these calls is not needed.

Hm.  It's not doing that for *every* row in pg_proc, at least.
I speculate that it is collecting and then not printing the info
about functions that are in extensions --- can you check on
how many there are of those?

(Actually, if you've got a whole lot of objects inside extensions,
maybe that explains the 5000 calls?)

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

30 августа 2021 г., 17:45:51

On Mon, Aug 30, 2021 at 10:11:22AM -0400, Tom Lane wrote:
> I don't suppose you could send me a schema-only dump of that
> database, off-list?  I'm now quite curious.

Asked the owners for their permission.

> > The thing is - even though it was called 1804 times, dump contains data only
> > about 107 functions (pg_restore -l schema.dump | grep -c FUNCTION), so it kinda
> > seems that 94% of these calls is not needed.

> Hm.  It's not doing that for *every* row in pg_proc, at least.
> I speculate that it is collecting and then not printing the info
> about functions that are in extensions --- can you check on
> how many there are of those?
> (Actually, if you've got a whole lot of objects inside extensions,
> maybe that explains the 5000 calls?)

Well, not sure if that's a lot, but:
there are 15 extensions, including plpgsql.

SELECT
    count(*)
FROM
    pg_catalog.pg_depend
WHERE
    refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass
    AND deptype = 'e';


return 2110 objects:

SELECT
    classid::regclass,
    count(*)
FROM
    pg_catalog.pg_depend
WHERE
    refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass
    AND deptype = 'e'
GROUP BY
    1
ORDER BY
    1;

         classid         │ count 
─────────────────────────┼───────
 pg_type                 │    31
 pg_proc                 │  1729
 pg_class                │    61
 pg_foreign_data_wrapper │     1
 pg_cast                 │    30
 pg_language             │     1
 pg_opclass              │    73
 pg_operator             │   111
 pg_opfamily             │    73
(9 rows)

Best regards,

depesz

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

30 августа 2021 г., 17:58:07

hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Mon, Aug 30, 2021 at 10:11:22AM -0400, Tom Lane wrote:
>> I speculate that it is collecting and then not printing the info
>> about functions that are in extensions --- can you check on
>> how many there are of those?

>          classid         │ count
> ─────────────────────────┼───────
>  pg_type                 │    31
>  pg_proc                 │  1729
>  pg_class                │    61
>  pg_foreign_data_wrapper │     1
>  pg_cast                 │    30
>  pg_language             │     1
>  pg_opclass              │    73
>  pg_operator             │   111
>  pg_opfamily             │    73
> (9 rows)

Ah-hah.  Those 1729 extension-owned functions account nicely
for the extra probes into pg_proc, and I bet they are causing
the unexplained getFormattedTypeName calls too.  So the
*real* problem here seems to be that we're doing too much
work on objects that are not going to be dumped because they
are extension members.  I'll take a look at that later if
nobody beats me to it.

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

30 августа 2021 г., 19:42:39

I wrote:
> Ah-hah.  Those 1729 extension-owned functions account nicely
> for the extra probes into pg_proc, and I bet they are causing
> the unexplained getFormattedTypeName calls too.  So the
> *real* problem here seems to be that we're doing too much
> work on objects that are not going to be dumped because they
> are extension members.  I'll take a look at that later if
> nobody beats me to it.

I took a quick look at this, and it seems to be mostly the fault
of the DUMP_COMPONENT refactorization that was done awhile ago.
We create DumpableObjects for all these objects, because we need
those to track dependencies.  When we arrive at dumpFunc() for
an extension-owned object, it has the DUMP_COMPONENT_SECLABEL
and DUMP_COMPONENT_POLICY flag bits set, whether or not the
function actually has any such properties.  This causes dumpFunc
to run through its data collection query, even though nothing
at all is going to get output.

I see that the reason those flags become set is that
checkExtensionMembership does this for an extension member:

            dobj->dump = ext->dobj.dump_contains & (DUMP_COMPONENT_ACL |
                                                    DUMP_COMPONENT_SECLABEL |
                                                    DUMP_COMPONENT_POLICY);

There is logic elsewhere that causes the DUMP_COMPONENT_ACL flag
to get cleared if there's no interesting ACL for the object, but
I see no such logic for SECLABEL or POLICY.  That omission is costing
us an awful lot of wasted queries in any database with a lot of
extension-owned objects.

I'm quite allergic to the way that the ACL logic is implemented anyhow,
as there seem to be N copies of essentially identical logic, not to
mention all the inefficient left joins and subqueries that were added
to the fundamental data-gathering queries --- which are only supposed
to find out which objects we want to dump, not expensively collect
scads of detail about every object in the catalogs.  I think this is
less in need of a tweak than "burn it to the ground and start over".
I wonder if we can't get to a place where there's only one query that
actually looks into pg_init_privs, more like the way we do it for
descriptions and seclabels (not that the seclabel code is perfect,
as we've found here).

Anyway, it doesn't look like there's much hope of improving this
aspect without a significant rewrite.  One band-aidy idea is that
we could check --no-security-labels earlier and not allow that
flag bit to become set in the first place, but that only helps if
the user gives that flag, which few would.  (I'm also wondering
more than a little bit why we're allowing DUMP_COMPONENT_POLICY
to become set on objects that aren't tables.)

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

30 августа 2021 г., 20:33:42

hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Mon, Aug 30, 2021 at 10:11:22AM -0400, Tom Lane wrote:
>> I don't suppose you could send me a schema-only dump of that
>> database, off-list?  I'm now quite curious.

> Asked the owners for their permission.

BTW, I think you can skip that part now --- it seems like the extensions
sufficiently explain the extra queries.

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

Tom Lane

Дата:

21 октября 2021 г., 00:46:01

I wrote:
> Anyway, it doesn't look like there's much hope of improving this
> aspect without a significant rewrite.

Just to close out this thread: I've now posted such a rewrite at

https://www.postgresql.org/message-id/2273648.1634764485%40sss.pgh.pa.us

            regards, tom lane

Re: Can we get rid of repeated queries from pg_dump?

От

hubert depesz lubaczewski

Дата:

21 октября 2021 г., 13:52:53

On Wed, Oct 20, 2021 at 05:46:01PM -0400, Tom Lane wrote:
> I wrote:
> > Anyway, it doesn't look like there's much hope of improving this
> > aspect without a significant rewrite.
> 
> Just to close out this thread: I've now posted such a rewrite at
> https://www.postgresql.org/message-id/2273648.1634764485%40sss.pgh.pa.us

That looks amazing. Thanks a lot.

Best regards,

depesz

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Can we get rid of repeated queries from pg_dump?

Вложения

Вложения

Вложения