Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

Поиск
Список
Период
Сортировка
От Aleksander Alekseev
Тема Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Дата
Msg-id CAJ7c6TPuMSyxy=BoipJaa4H4N2fp4up0ZEvq3J3eXXdC_ym3OA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers
Hi John,

Many thanks for all the great feedback!

> Okay, the changes look good. To go further, I think we need to combine into two patches, one with 0001-0003 and one
with0004: 
>
> 1. Correct false statements about "shutdown" etc. This should contain changes that can safely be patched all the way
tov11. 
> 2. Change bad advice (single-user mode) into good advice. We can target head first, and then try to adopt as far back
aswe safely can (and test). 

Done.

> > > In swapping this topic back in my head, I also saw [2] where Robert suggested
> > >
> > > "that old prepared transactions and stale replication
> > > slots should be emphasized more prominently.  Maybe something like:
> > >
> > > HINT:  Commit or roll back old prepared transactions, drop stale
> > > replication slots, or kill long-running sessions.
> > > Ensure that autovacuum is progressing, or run a manual database-wide VACUUM."
> >
> > It looks like the hint regarding replication slots was added at some
> > point. Currently we have:
> >
> > ```
> > errhint( [...]
> >     "You might also need to commit or roll back old prepared
> > transactions, or drop stale replication slots.")));
> > ```
>
> Yes, the exact same text as it appeared in the [2] thread above, which prompted Robert's comment I quoted. I take the
pointto mean: All of these things need to be taken care of *first*, before vacuuming, so the hint should order things
sothat it is clear. 
>
> > Please let me know if you think
> > we should also add a suggestion to kill long-running sessions, etc.
>
> +1 for also adding that.

OK, done. I included this change as a separate patch. It can be
squashed with another one if necessary.

While on it, I noticed that multixact.c still talks about a
"shutdown". I made corresponding changes in 0001.

> - errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
> + errmsg("database is not accepting commands that generate new XIDs to avoid wraparound data loss in database
\"%s\"",
>
> I'm not quite on board with the new message, but welcome additional opinions. For one, it's a bit longer and now
ambiguous.I also bet that "generate XIDs" doesn't  really communicate anything useful. The people who understand
exactlywhat that means, and what the consequences are, are unlikely to let the system get near wraparound in the first
place,and might even know enough to ignore the hint. 
>
> I'm thinking we might need to convey something about "writes". While it's less technically correct, I imagine it's
moreuseful. Remember, many users have it drilled in their heads that they need to drop immediately to single-user mode.
I'dlike to give some idea of what they can and cannot do. 

This particular wording was chosen for consistency with multixact.c:

```
errmsg("database is not accepting commands that generate new
MultiXactIds to avoid wraparound data loss in database \"%s\"",
```

The idea of using "writes" is sort of OK, but note that the same
message will appear for a query like:

```
SELECT pg_current_xact_id();
```

... which doesn't do writes. The message will be misleading in this case.

On top of that, although a PostgreSQL user may not be aware of
MultiXactIds, arguably there are many users that are aware of XIDs.
Not to mention the fact that XIDs are well documented.

I didn't make this change in v4 since it seems to be controversial and
probably not the highest priority at the moment. I suggest we discuss
it separately.

> I propose for discussion that 0004 should show in the docs all the queries for finding prepared xacts, repl slots
etc.If we ever show the info at runtime, we can dispense with the queries, but there seems to be no urgency for that... 

Good idea.

> +     Previously it was required to stop the postmaster and VACUUM the database
> +     in a single-user mode. There is no need to use a single-user mode anymore.
>
> I think we need to go further and actively warn against it: It's slow, impossible to monitor, disables replication
anddisables safeguards against wraparound. (Other bad things too, but these are easily understandable for users) 
>
> Maybe mention also that it's main use in wraparound situations is for a way to perform DROPs and TRUNCATEs if that
wouldhelp speed up resolution. 

Fixed.


--
Best regards,
Aleksander Alekseev

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Sketch of a fix for that truncation data corruption issue