Обсуждение: Re: pgsql: autovacuum: handle analyze for partitioned tables

Поиск
Список
Период
Сортировка

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-08, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > autovacuum: handle analyze for partitioned tables
> 
> Looks like this has issues under EXEC_BACKEND:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2021-04-08%2005%3A50%3A08

Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
think this is unrelated to that, but rather a race condition.

The backtrace saved by buildfarm is:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  relation_needs_vacanalyze (relid=relid@entry=43057, relopts=relopts@entry=0x0,
classForm=classForm@entry=0x7e000501eef0,tabentry=0x5611ec71b030,
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000,
dovacuum=dovacuum@entry=0x7ffd78cc4ee0,doanalyze=0x7ffd78cc4ee1, wraparound=0x7ffd78cc4ee2) at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
3237                    childclass = (Form_pg_class) GETSTRUCT(childtuple);
#0  relation_needs_vacanalyze (relid=relid@entry=43057, relopts=relopts@entry=0x0,
classForm=classForm@entry=0x7e000501eef0,tabentry=0x5611ec71b030,
effective_multixact_freeze_max_age=effective_multixact_freeze_max_age@entry=400000000,
dovacuum=dovacuum@entry=0x7ffd78cc4ee0,doanalyze=0x7ffd78cc4ee1, wraparound=0x7ffd78cc4ee2) at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:3237
#1  0x00005611eb09fc91 in do_autovacuum () at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:2168
#2  0x00005611eb0a0f8b in AutoVacWorkerMain (argc=argc@entry=1, argv=argv@entry=0x5611ec61f1e0) at
/mnt/resource/andres/bf/culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/autovacuum.c:1715

the code in question is:

            children = find_all_inheritors(relid, AccessShareLock, NULL);

            foreach(lc, children)
            {
                Oid            childOID = lfirst_oid(lc);
                HeapTuple    childtuple;
                Form_pg_class childclass;

                childtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(childOID));
                childclass = (Form_pg_class) GETSTRUCT(childtuple);

Evidently SearchSysCache must be returning NULL, but how come that
happens, when we have acquired lock on the rel during
find_all_inheritors?

I would suggest that we do not take lock here at all, and just skip the
rel if SearchSysCache returns empty, as in the attached.  Still, I am
baffled about this crash.

-- 
Álvaro Herrera       Valdivia, Chile
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)

Вложения

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Apr-08, Tom Lane wrote:
>> Looks like this has issues under EXEC_BACKEND:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2021-04-08%2005%3A50%3A08

> Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
> think this is unrelated to that, but rather a race condition.

Yeah.  I hit this on another machine that isn't using EXEC_BACKEND,
and I concur it looks more like a race condition.  I think the problem
is that autovacuum is calling find_all_inheritors() on a relation it
has no lock on, contrary to that function's API spec.  find_all_inheritors
assumes the OID it's given is valid and locked, and adds it to the
result list automatically.  Then it looks for children, and won't find
any in the race case where somebody else just dropped the table.
So we come back to relation_needs_vacanalyze with a list of just the
original rel OID, and since this loop believes that every syscache fetch
it does will succeed, kaboom.

I do not think it is sane to do find_all_inheritors() with no lock,
so I'd counsel doing something about that rather than band-aiding the
symptom.  On the other hand, it's also not really okay not to have
an if-test-and-elog after the SearchSysCache call.  "A cache lookup
cannot fail" is not an acceptable assumption in my book.

BTW, another thing that looks like a race condition is the
extract_autovac_opts() call that is done a little bit earlier,
also without lock.  I think this is actually safe, but it's ONLY
safe because we resisted the calls by certain people to add a
toast table to pg_class.  Otherwise, fetching reloptions could
have involved a toast pointer dereference, and it would then be
racy whether the toasted data was still there.  As-is, even if
the pg_class row we're looking at has been deleted, we can safely
disassemble its reloptions.  I think this matter is deserving
of a comment at least.

            regards, tom lane



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-08, Tom Lane wrote:

> Yeah.  I hit this on another machine that isn't using EXEC_BACKEND,
> and I concur it looks more like a race condition.  I think the problem
> is that autovacuum is calling find_all_inheritors() on a relation it
> has no lock on, contrary to that function's API spec.  find_all_inheritors
> assumes the OID it's given is valid and locked, and adds it to the
> result list automatically.  Then it looks for children, and won't find
> any in the race case where somebody else just dropped the table.

Hmm.  Autovacuum tries hard to avoid grabbing locks on relations until
really needed (at vacuum/analyze time), which is why all these tests
only use data that can be found in the pg_class rows and pgstat entries.
So I tend to think that my initial instinct was the better direction: we
should not be doing any find_all_inheritors() here at all, but instead
rely on pg_class.reltuples to be set for the partitioned table.

I'll give that another look.  Most places already assume that reltuples
isn't set for a partitioned table, so they shouldn't care.  I wonder,
though, whether we should set relpages to some value other than 0 or -1.
(I'm inclined not to, since autovacuum does not use it.)

-- 
Álvaro Herrera       Valdivia, Chile



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Apr-08, Tom Lane wrote:
>> Yeah.  I hit this on another machine that isn't using EXEC_BACKEND,
>> and I concur it looks more like a race condition.  I think the problem
>> is that autovacuum is calling find_all_inheritors() on a relation it
>> has no lock on, contrary to that function's API spec.

> Hmm.  Autovacuum tries hard to avoid grabbing locks on relations until
> really needed (at vacuum/analyze time), which is why all these tests
> only use data that can be found in the pg_class rows and pgstat entries.

Yeah, I was worried about that.

> So I tend to think that my initial instinct was the better direction: we
> should not be doing any find_all_inheritors() here at all, but instead
> rely on pg_class.reltuples to be set for the partitioned table.

+1

            regards, tom lane



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-08, Tom Lane wrote:

> > So I tend to think that my initial instinct was the better direction: we
> > should not be doing any find_all_inheritors() here at all, but instead
> > rely on pg_class.reltuples to be set for the partitioned table.
> 
> +1

This patch does that.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

Вложения

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Zhihong Yu
Дата:


On Thu, Apr 8, 2021 at 1:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Apr-08, Tom Lane wrote:

> > So I tend to think that my initial instinct was the better direction: we
> > should not be doing any find_all_inheritors() here at all, but instead
> > rely on pg_class.reltuples to be set for the partitioned table.
>
> +1

This patch does that.

--
Álvaro Herrera                            39°49'30"S 73°17'W
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

Hi,
Within truncate_update_partedrel_stats(), dirty is declared within the loop.
+       if (rd_rel->reltuples != 0)
+       {
...
+       if (dirty)

The two if blocks can be merged. The variable dirty can be dropped.

Cheers 

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-08, Zhihong Yu wrote:

> Hi,
> Within truncate_update_partedrel_stats(), dirty is declared within the loop.
> +       if (rd_rel->reltuples != 0)
> +       {
> ...
> +       if (dirty)
> 
> The two if blocks can be merged. The variable dirty can be dropped.

Hi, thanks for reviewing.  Yes, evidently I copied vac_update_relstats
too closely -- that boolean is not necessary.

-- 
Álvaro Herrera       Valdivia, Chile



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Tom Lane
Дата:
Could we get this pushed sooner rather than later?  The buildfarm
is showing a wide variety of intermittent failures on HEAD, and it's
hard to tell how many of them trace to this one bug.

            regards, tom lane



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-09, Tom Lane wrote:

> Could we get this pushed sooner rather than later?  The buildfarm
> is showing a wide variety of intermittent failures on HEAD, and it's
> hard to tell how many of them trace to this one bug.

Pushed now, thanks.

-- 
Álvaro Herrera       Valdivia, Chile
"Digital and video cameras have this adjustment and film cameras don't for the
same reason dogs and cats lick themselves: because they can."   (Ken Rockwell)



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Robert Haas
Дата:
On Fri, Apr 9, 2021 at 11:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2021-Apr-09, Tom Lane wrote:
> > Could we get this pushed sooner rather than later?  The buildfarm
> > is showing a wide variety of intermittent failures on HEAD, and it's
> > hard to tell how many of them trace to this one bug.
>
> Pushed now, thanks.

Does this need to worry about new partitions getting attached to a
partitioned table, or old ones getting detached? (Maybe it does
already, not sure.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-09, Robert Haas wrote:

> Does this need to worry about new partitions getting attached to a
> partitioned table, or old ones getting detached? (Maybe it does
> already, not sure.)

Good question.  It does not.

I suppose you could just let that happen automatically -- I mean, next
time the partitioned table is analyzed, it'll scan all attached
partitions.  But if no tuples are modified afterwards in existing
partitions (a common scenario), and the newly attached partition
contains lots of rows, then only future rows in the newly attached
partition would affect the stats of the partitioned table, and it could
be a long time before that causes an analyze on the partitioned table to
occur.

Maybe a way to attack this is to send a the "anl_ancestors" message to
the collector on attach and detach, adding a new flag ("is
attach/detach"), which indicates to add not only "changes_since_analyze
- changes_since_analyze_reported", but also "n_live_tuples".

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Justin Pryzby
Дата:
On Fri, Apr 09, 2021 at 05:31:55PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Robert Haas wrote:
> 
> > Does this need to worry about new partitions getting attached to a
> > partitioned table, or old ones getting detached? (Maybe it does
> > already, not sure.)
> 
> Good question.  It does not.

I think there's probably cases where this is desirable, and cases where it's
undesirable, so I don't think it's necessarily a problem.

One data point: we do DETACH/ATTACH tables during normal operation, before
type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
table for a long time.  It'd be undesirable (but maybe of no great consequence)
to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
afterwards.

However, I think DROP should be handled ?

-- 
Justin



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Tomas Vondra
Дата:

On 4/9/21 11:45 PM, Justin Pryzby wrote:
> On Fri, Apr 09, 2021 at 05:31:55PM -0400, Alvaro Herrera wrote:
>> On 2021-Apr-09, Robert Haas wrote:
>>
>>> Does this need to worry about new partitions getting attached to a
>>> partitioned table, or old ones getting detached? (Maybe it does
>>> already, not sure.)
>>
>> Good question.  It does not.
> 
> I think there's probably cases where this is desirable, and cases where it's
> undesirable, so I don't think it's necessarily a problem.
> 
> One data point: we do DETACH/ATTACH tables during normal operation, before
> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
> table for a long time.  It'd be undesirable (but maybe of no great consequence)
> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
> afterwards.
> 
> However, I think DROP should be handled ?
> 

IMHO we should prefer the default behavior which favors having updated
statistics, and maybe have a way to override it for individual commands.
So ATTACH would update changes_since_analyze by default, but it would be
possible to disable that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-09, Justin Pryzby wrote:

> One data point: we do DETACH/ATTACH tables during normal operation, before
> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
> table for a long time.  It'd be undesirable (but maybe of no great consequence)
> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
> afterwards.

You mean to trigger an ANALYZE, not to trigger an ALTER, right?

I think I agree with Tomas: we should do it by default, and offer some
way to turn that off.  I suppose a new reloptions, solely for
partitioned tables, would be the way to do it.

> However, I think DROP should be handled ?

DROP of a partition? ... I would think it should do the same as DETACH,
right?  Inform that however many rows the partition had, are now changed
in ancestors.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Justin Pryzby
Дата:
On Fri, Apr 09, 2021 at 06:16:59PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Justin Pryzby wrote:
> 
> > One data point: we do DETACH/ATTACH tables during normal operation, before
> > type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
> > table for a long time.  It'd be undesirable (but maybe of no great consequence)
> > to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
> > afterwards.
> 
> You mean to trigger an ANALYZE, not to trigger an ALTER, right?

Oops, right.  It's slightly undesirable for a DETACH to cause an ANALYZE.

> I think I agree with Tomas: we should do it by default, and offer some
> way to turn that off.  I suppose a new reloptions, solely for
> partitioned tables, would be the way to do it.
> 
> > However, I think DROP should be handled ?
> 
> DROP of a partition? ... I would think it should do the same as DETACH,
> right?  Inform that however many rows the partition had, are now changed
> in ancestors.

Yes, drop of an (attached) partition.  The case for DROP is clear, since it
was clearly meant to go away forever.  The case for DETACH seems somewhat less
clear.

The current behavior of pg_dump/restore (since 33a53130a) is to CREATE+ATTACH,
so there's an argument that if DROPping the partition counts towards the
parent's analyze, then so should CREATE+ATTACH.

-- 
Justin



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Andres Freund
Дата:
Hi,

On 2021-04-09 11:54:30 -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Tom Lane wrote:
>
> > Could we get this pushed sooner rather than later?  The buildfarm
> > is showing a wide variety of intermittent failures on HEAD, and it's
> > hard to tell how many of them trace to this one bug.
>
> Pushed now, thanks.

I assume this is also the likely explanation for / fix for:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-04-08%2016%3A03%3A03

==3500389== VALGRINDERROR-BEGIN
==3500389== Invalid read of size 8
==3500389==    at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)
==3500389==    by 0x4EE0AF: do_autovacuum (autovacuum.c:2168)
==3500389==    by 0x4EEEA8: AutoVacWorkerMain (autovacuum.c:1715)
==3500389==    by 0x4EEF7F: StartAutoVacWorker (autovacuum.c:1500)
==3500389==    by 0x4FD2E4: StartAutovacuumWorker (postmaster.c:5539)
==3500389==    by 0x4FE50A: sigusr1_handler (postmaster.c:5243)
==3500389==    by 0x4A6513F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==3500389==    by 0x4DCA865: select (select.c:41)
==3500389==    by 0x4FEB75: ServerLoop (postmaster.c:1701)
==3500389==    by 0x4FFE52: PostmasterMain (postmaster.c:1409)
==3500389==    by 0x442563: main (main.c:209)
==3500389==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==3500389==
==3500389== VALGRINDERROR-END
==3500389==
==3500389== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==3500389==  Access not within mapped region at address 0x10
==3500389==    at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)
==3500389==    by 0x4EE0AF: do_autovacuum (autovacuum.c:2168)
==3500389==    by 0x4EEEA8: AutoVacWorkerMain (autovacuum.c:1715)
==3500389==    by 0x4EEF7F: StartAutoVacWorker (autovacuum.c:1500)
==3500389==    by 0x4FD2E4: StartAutovacuumWorker (postmaster.c:5539)
==3500389==    by 0x4FE50A: sigusr1_handler (postmaster.c:5243)
==3500389==    by 0x4A6513F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==3500389==    by 0x4DCA865: select (select.c:41)
==3500389==    by 0x4FEB75: ServerLoop (postmaster.c:1701)
==3500389==    by 0x4FFE52: PostmasterMain (postmaster.c:1409)
==3500389==    by 0x442563: main (main.c:209)
==3500389==  If you believe this happened as a result of a stack
==3500389==  overflow in your program's main thread (unlikely but
==3500389==  possible), you can try to increase the size of the
==3500389==  main thread stack using the --main-stacksize= flag.
==3500389==  The main thread stack size used in this run was 8388608.


Greetings,

Andres Freund



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
Hello

On 2021-Apr-09, Andres Freund wrote:

> I assume this is also the likely explanation for / fix for:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-04-08%2016%3A03%3A03
> 
> ==3500389== VALGRINDERROR-BEGIN
> ==3500389== Invalid read of size 8
> ==3500389==    at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)
> ==3500389==    by 0x4EE0AF: do_autovacuum (autovacuum.c:2168)
> ==3500389==    by 0x4EEEA8: AutoVacWorkerMain (autovacuum.c:1715)
> ==3500389==    by 0x4EEF7F: StartAutoVacWorker (autovacuum.c:1500)
> ==3500389==    by 0x4FD2E4: StartAutovacuumWorker (postmaster.c:5539)

Hmm, I didn't try to reproduce this, but yeah it sounds quite likely
that it's the same issue -- line 3237 is the GETSTRUCT call where the
other one was crashing, which is now gone.

Thanks for pointing it out,

-- 
Álvaro Herrera       Valdivia, Chile



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-04-09 11:54:30 -0400, Alvaro Herrera wrote:
>> Pushed now, thanks.

> I assume this is also the likely explanation for / fix for:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-04-08%2016%3A03%3A03

> ==3500389== VALGRINDERROR-BEGIN
> ==3500389== Invalid read of size 8
> ==3500389==    at 0x4EC4B8: relation_needs_vacanalyze (autovacuum.c:3237)

Yeah, looks like the same thing to me; it's the same line that was
crashing in the non-valgrind reports.

            regards, tom lane



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Tomas Vondra
Дата:
On 4/10/21 12:29 AM, Justin Pryzby wrote:
> On Fri, Apr 09, 2021 at 06:16:59PM -0400, Alvaro Herrera wrote:
>> On 2021-Apr-09, Justin Pryzby wrote:
>>
>>> One data point: we do DETACH/ATTACH tables during normal operation, before
>>> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking the
>>> table for a long time.  It'd be undesirable (but maybe of no great consequence)
>>> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
>>> afterwards.
>>
>> You mean to trigger an ANALYZE, not to trigger an ALTER, right?
> 
> Oops, right.  It's slightly undesirable for a DETACH to cause an ANALYZE.
> 
>> I think I agree with Tomas: we should do it by default, and offer some
>> way to turn that off.  I suppose a new reloptions, solely for
>> partitioned tables, would be the way to do it.
>>
>>> However, I think DROP should be handled ?
>>
>> DROP of a partition? ... I would think it should do the same as DETACH,
>> right?  Inform that however many rows the partition had, are now changed
>> in ancestors.
> 
> Yes, drop of an (attached) partition.  The case for DROP is clear, since it
> was clearly meant to go away forever.  The case for DETACH seems somewhat less
> clear.
> 
> The current behavior of pg_dump/restore (since 33a53130a) is to CREATE+ATTACH,
> so there's an argument that if DROPping the partition counts towards the
> parent's analyze, then so should CREATE+ATTACH.
> 

I think it's tricky to "optimize" the behavior after ATTACH/DETACH. I'd
argue that in principle, we should aim to keep accurate statistics,  so
ATTACH should be treated as insert of all rows, and DETACH should be
treated as delete of all rows. Se for the purpose of ANALYZE, we should
propagate reltuples as changes_since_analyze after ATTACH/DETACH.

Yes, it may result in more frequent ANALYZE on the parent, but I think
that's necessary. Repeated attach/detach of the same partition may bloat
the value, but I guess that's an example of "If it hurts don't do it."

What I think we might do is offer some light-weight analyze variant,
e.g. based on the merging of statistics (I've posted a PoC patch a
couple days ago.). That would make the ANALYZEs on parent much cheaper,
so those "unnecessary" analyzes would not be an issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Justin Pryzby
Дата:
On Thu, Apr 08, 2021 at 04:11:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-08, Tom Lane wrote:
> 
> > > So I tend to think that my initial instinct was the better direction: we
> > > should not be doing any find_all_inheritors() here at all, but instead
> > > rely on pg_class.reltuples to be set for the partitioned table.
> > 
> > +1
> 
> This patch does that.

|commit 0e69f705cc1a3df273b38c9883fb5765991e04fe
|Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
|Date:   Fri Apr 9 11:29:08 2021 -0400
|
|    Set pg_class.reltuples for partitioned tables
|    
|    When commit 0827e8af70f4 added auto-analyze support for partitioned
|    tables, it included code to obtain reltuples for the partitioned table
|    as a number of catalog accesses to read pg_class.reltuples for each
|    partition.  That's not only very inefficient, but also problematic
|    because autovacuum doesn't hold any locks on any of those tables -- and
|    doesn't want to.  Replace that code with a read of pg_class.reltuples
|    for the partitioned table, and make sure ANALYZE and TRUNCATE properly
|    maintain that value.
|    
|    I found no code that would be affected by the change of relpages from
|    zero to non-zero for partitioned tables, and no other code that should
|    be maintaining it, but if there is, hopefully it'll be an easy fix.

+       else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       {
+               /*
+                * Partitioned tables don't have storage, so we don't set any fields in
+                * their pg_class entries except for relpages, which is necessary for
+                * auto-analyze to work properly.
+                */
+               vac_update_relstats(onerel, -1, totalrows,
+                                                       0, false, InvalidTransactionId,
+                                                       InvalidMultiXactId,
+                                                       in_outer_xact);
+       }

This refers to "relpages", but I think it means "reltuples".

src/include/commands/vacuum.h:extern void vac_update_relstats(Relation relation,
src/include/commands/vacuum.h-                                BlockNumber num_pages,
src/include/commands/vacuum.h-                                double num_tuples,
src/include/commands/vacuum.h-                                BlockNumber num_all_visible_pages,

I'm adding it for the next round of "v14docs" patch if you don't want to make a
separate commit for that.

-- 
Justin



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-08, Tom Lane wrote:

> BTW, another thing that looks like a race condition is the
> extract_autovac_opts() call that is done a little bit earlier,
> also without lock.  I think this is actually safe, but it's ONLY
> safe because we resisted the calls by certain people to add a
> toast table to pg_class.  Otherwise, fetching reloptions could
> have involved a toast pointer dereference, and it would then be
> racy whether the toasted data was still there.  As-is, even if
> the pg_class row we're looking at has been deleted, we can safely
> disassemble its reloptions.  I think this matter is deserving
> of a comment at least.

True.  I added a comment there.

Thanks,

-- 
Álvaro Herrera       Valdivia, Chile
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-09, Robert Haas wrote:

> Does this need to worry about new partitions getting attached to a
> partitioned table, or old ones getting detached? (Maybe it does
> already, not sure.)

I was pinged because this is listed as an open item.  I don't think it
is one.  Handling ATTACH/DETACH/DROP is important for overall
consistency, of course, so we should do it eventually, but the fact that
autovacuum runs analyze *at all* for partitioned tables is an enormous
step forward from it not doing so.  I think we should treat ATTACH/
DETACH/DROP handling as a further feature to be added in a future
release, not an open item to be fixed in the current one.

Now, if somebody sees a very trivial way to handle it, let's discuss it,
but *I* don't see it.

-- 
Álvaro Herrera       Valdivia, Chile
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."              (Andrew Sullivan)



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
yuzuko
Дата:
Hi,

Thank you for discussing this item.

> I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.
>
I agree with your opinion.

> Now, if somebody sees a very trivial way to handle it, let's discuss it,
> but *I* don't see it.
>
I started thinking about the way to handle ATTACH/DETACH/DROP,
but I haven't created patches.  If no one has done it yet, I'll keep working.

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Justin Pryzby
Дата:
On Wed, Apr 21, 2021 at 07:06:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Robert Haas wrote:
> > Does this need to worry about new partitions getting attached to a
> > partitioned table, or old ones getting detached? (Maybe it does
> > already, not sure.)
> 
> I was pinged because this is listed as an open item.  I don't think it
> is one.  Handling ATTACH/DETACH/DROP is important for overall
> consistency, of course, so we should do it eventually, but the fact that
> autovacuum runs analyze *at all* for partitioned tables is an enormous
> step forward from it not doing so.  I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.

I think this is okay, with the caveat that we'd be changing the behavior
(again) in a future release, rather than doing it all in v14.

Maybe the behavior should be documented, though.  Actually, I thought the
pre-existing (non)behavior of autoanalyze would've been documented, and we'd
now update that.  All I can find is this:

https://www.postgresql.org/docs/current/sql-analyze.html
|The autovacuum daemon, however, will only consider inserts or updates on the
|parent table itself when deciding whether to trigger an automatic analyze for
|that table

I think that should probably have been written down somewhere other than for
the manual ANALYZE command, but in any case it seems to be outdated now.

-- 
Justin



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Justin Pryzby
Дата:
On Thu, Apr 22, 2021 at 12:43:46PM -0500, Justin Pryzby wrote:
> Maybe the behavior should be documented, though.  Actually, I thought the
> pre-existing (non)behavior of autoanalyze would've been documented, and we'd
> now update that.  All I can find is this:
> 
> https://www.postgresql.org/docs/current/sql-analyze.html
> |The autovacuum daemon, however, will only consider inserts or updates on the
> |parent table itself when deciding whether to trigger an automatic analyze for
> |that table
> 
> I think that should probably have been written down somewhere other than for
> the manual ANALYZE command, but in any case it seems to be outdated now.

Starting with this 

From a7ae56a879b6bacc4fc22cbd769851713be89840 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 23 Apr 2021 09:15:58 -0500
Subject: [PATCH] WIP: Add docs for autovacuum processing of partitioned tables

---
 doc/src/sgml/perform.sgml        | 3 ++-
 doc/src/sgml/ref/analyze.sgml    | 4 +++-
 doc/src/sgml/ref/pg_restore.sgml | 6 ++++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..814c3cffbe 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,7 +1767,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    <para>
     Whenever you have significantly altered the distribution of data
     within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended.
This
-    includes bulk loading large amounts of data into the table.  Running
+    includes bulk loading large amounts of data into the table,
+    or attaching/detaching partitions.  Running
     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
     ensures that the planner has up-to-date statistics about the
     table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc161..179ae3555d 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -255,11 +255,13 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
     rows of the parent table only, and a second time on the rows of the
     parent table with all of its children.  This second set of statistics
     is needed when planning queries that traverse the entire inheritance
-    tree.  The autovacuum daemon, however, will only consider inserts or
+    tree.  For legacy inheritence, the autovacuum daemon, only considers inserts or
     updates on the parent table itself when deciding whether to trigger an
     automatic analyze for that table.  If that table is rarely inserted into
     or updated, the inheritance statistics will not be up to date unless you
     run <command>ANALYZE</command> manually.
+    For partitioned tables, inserts and updates on the partitions are counted
+    towards auto-analyze on the parent.
   </para>
 
   <para>
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 93ea937ac8..260bf0feb7 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -922,8 +922,10 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
   <para>
    Once restored, it is wise to run <command>ANALYZE</command> on each
-   restored table so the optimizer has useful statistics; see
-   <xref linkend="vacuum-for-statistics"/> and
+   restored table so the optimizer has useful statistics.
+   If the table is a partition or an inheritence child, it may also be useful
+   to analyze the parent table.
+   See <xref linkend="vacuum-for-statistics"/> and
    <xref linkend="autovacuum"/> for more information.
   </para>
 
-- 
2.17.0




Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Michael Paquier
Дата:
On Wed, Apr 21, 2021 at 07:06:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-09, Robert Haas wrote:
>> Does this need to worry about new partitions getting attached to a
>> partitioned table, or old ones getting detached? (Maybe it does
>> already, not sure.)
>
> I was pinged because this is listed as an open item.  I don't think it
> is one.  Handling ATTACH/DETACH/DROP is important for overall
> consistency, of course, so we should do it eventually, but the fact that
> autovacuum runs analyze *at all* for partitioned tables is an enormous
> step forward from it not doing so.  I think we should treat ATTACH/
> DETACH/DROP handling as a further feature to be added in a future
> release, not an open item to be fixed in the current one.

Yeah, I'd agree that this could be done as some future work so it
looks fine to move it to the section for "won't fix" items, but that
sounds rather tricky to me as there are dependencies across the
partitions.

Now, I don't think that we are completely done either, as one
documentation patch has been sent here:
https://www.postgresql.org/message-id/20210423180152.GA17270@telsasoft.com

Alvaro, could you look at that?
--
Michael

Вложения

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
On 2021-Apr-23, Justin Pryzby wrote:

> On Thu, Apr 22, 2021 at 12:43:46PM -0500, Justin Pryzby wrote:
> > 
> > I think that should probably have been written down somewhere other than for
> > the manual ANALYZE command, but in any case it seems to be outdated now.
> 
> Starting with this 

Agreed, we need some more docs here.  I lightly edited yours and ended
up with this -- mostly I think partitioned tables should not be in the
same paragraph as legacy inheritance because the behavior is different
enough (partitioned tables are not analyzed twice).

I'll give a deeper look tomorrow to see if other places also need edits.

Thanks

-- 
Álvaro Herrera       Valdivia, Chile

Вложения

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
New version, a bit more ambitious.  I think it's better to describe
behavior for partitioned tables ahead of inheritance.  Also, in the
ANALYZE reference page I split the topic in two: in one single paragraph
we now describe what happens with manual analyze for partitioned tables
and inheritance hierarchies; we describe the behavior of autovacuum in
one separate paragraph for each type of hierarchy, since the differences
are stark.

I noticed that difference while verifying the behavior that I was to
document.  If you look at ANALYZE VERBOSE output, it seems a bit
wasteful:

create table part (a int) partition by list (a);
create table part0 partition of part for values in (0);
create table part1 partition of part for values in (1);
create table part23 partition of part for values in (2, 3) partition by list (a);
create table part2 partition of part23 for values in (2);
create table part3 partition of part23 for values in (3);
insert into part select g%4 from generate_series(1, 50000000) g;

analyze verbose part;

INFO:  analyzing "public.part" inheritance tree
INFO:  "part1": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 dead rows; 7500 rows in sample,
12500060estimated total rows
 
INFO:  "part2": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 dead rows; 7500 rows in sample,
12500060estimated total rows
 
INFO:  "part3": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 dead rows; 7500 rows in sample,
12500060estimated total rows
 
INFO:  "part4": scanned 7500 of 55310 pages, containing 1695000 live rows and 0 dead rows; 7500 rows in sample,
12500060estimated total rows
 
INFO:  analyzing "public.part1"
INFO:  "part1": scanned 30000 of 55310 pages, containing 6779940 live rows and 0 dead rows; 30000 rows in sample,
12499949estimated total rows
 
INFO:  analyzing "public.part2"
INFO:  "part2": scanned 30000 of 55310 pages, containing 6779940 live rows and 0 dead rows; 30000 rows in sample,
12499949estimated total rows
 
INFO:  analyzing "public.part34" inheritance tree
INFO:  "part3": scanned 15000 of 55310 pages, containing 3390000 live rows and 0 dead rows; 15000 rows in sample,
12500060estimated total rows
 
INFO:  "part4": scanned 15000 of 55310 pages, containing 3389940 live rows and 0 dead rows; 15000 rows in sample,
12499839estimated total rows
 
INFO:  analyzing "public.part3"
INFO:  "part3": scanned 30000 of 55310 pages, containing 6780000 live rows and 0 dead rows; 30000 rows in sample,
12500060estimated total rows
 
INFO:  analyzing "public.part4"
INFO:  "part4": scanned 30000 of 55310 pages, containing 6780000 live rows and 0 dead rows; 30000 rows in sample,
12500060estimated total rows
 
ANALYZE

-- 
Álvaro Herrera       Valdivia, Chile
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)

Вложения

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
With English fixes from Bruce.

I think the note about autovacuum in the reference page for ANALYZE is a
bit out of place, but not enough to make me move the whole paragraph
elsewhere.  Maybe we should try to do that sometime.

-- 
Álvaro Herrera       Valdivia, Chile
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Вложения

Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Justin Pryzby
Дата:
On Thu, May 13, 2021 at 05:33:33PM -0400, Alvaro Herrera wrote:
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -817,6 +817,11 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
>  </programlisting>
>      is compared to the total number of tuples inserted, updated, or deleted
>      since the last <command>ANALYZE</command>.
> +    For partitioned tables, inserts and updates on partitions are counted
> +    towards this threshold; however partition meta-operations such as
> +    attachment, detachment or drop are not, so running a manual
> +    <command>ANALYZE</command> is recommended if the partition added or
> +    removed contains a statistically significant volume of data.

I suggest: "Inserts, updates and deletes on partitions of a partitioned table
are counted towards this threshold; however DDL operations such as ATTACH,
DETACH and DROP are not, ...

> +   and in addition it will analyze each individual partition separately.

remove "and" and say in addition COMMA
Or:
"it will also recursive into each partition and update its statistics."

> +   By constrast, if the table being analyzed has inheritance children,
> +   <command>ANALYZE</command> will gather statistics for that table twice:
> +   once on the rows of the parent table only, and a second time on the
> +   rows of the parent table with all of its children.  This second set of
> +   statistics is needed when planning queries that traverse the entire
> +   inheritance tree.  The children tables are not individually analyzed
> +   in this case.

say "The child tables themselves.."

> +  <para>
> +   For tables with inheritance children, the autovacuum daemon only
> +   counts inserts and deletes in the parent table itself when deciding
> +   whether to trigger an automatic analyze for that table.  If that table
> +   is rarely inserted into or updated, the inheritance statistics will
> +   not be up to date unless you run <command>ANALYZE</command> manually.
> +  </para>

This should be emphasized:
Tuples changed in inheritence children do not count towards analyze on the
parent table.  If the parent table is empty or rarely changed, it may never 
be processed by autovacuum.  It's necesary to periodically run an manual
ANALYZE to keep the statistics of the table hierarchy up to date.

I don't know why it says "inserted or updated" but doesn't say "or deleted" -
that seems like a back-patchable fix.

> +++ b/doc/src/sgml/ref/pg_restore.sgml
> @@ -922,8 +922,10 @@ CREATE DATABASE foo WITH TEMPLATE template0;
>  
>    <para>
>     Once restored, it is wise to run <command>ANALYZE</command> on each
> -   restored table so the optimizer has useful statistics; see
> -   <xref linkend="vacuum-for-statistics"/> and
> +   restored table so the optimizer has useful statistics.
> +   If the table is a partition or an inheritance child, it may also be useful
> +   to analyze the parent table.
> +   See <xref linkend="vacuum-for-statistics"/> and
>     <xref linkend="autovacuum"/> for more information.

maybe say: "analyze the parent to update statistics for the table hierarchy".



Re: pgsql: autovacuum: handle analyze for partitioned tables

От
Alvaro Herrera
Дата:
Thanks for these corrections -- I applied them and a few minor changes
from myself, and pushed.  Another set of eyes over the result would be
most welcome.

I hope we can close this now :-)

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)