Re: pg_walfile_name_offset can return inconsistent values

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: pg_walfile_name_offset can return inconsistent values
Дата
Msg-id ZVJZkU9O98Jbydml@momjian.us
обсуждение исходный текст
Ответ на Re: pg_walfile_name_offset can return inconsistent values  (Andres Freund <andres@anarazel.de>)
Ответы Re: pg_walfile_name_offset can return inconsistent values  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Fri, Nov 10, 2023 at 07:59:43PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2023-11-09 16:14:07 -0500, Bruce Momjian wrote:
> > On Thu, Nov  9, 2023 at 09:49:48PM +0100, Matthias van de Meent wrote:
> > > Either way, I think fix #1 is most correct (as was attached in
> > > offset2.diff, and quoted verbatim here), because that has no chance of
> > > having surprising underflowing behaviour when you use '0/0'::lsn as
> > > input.
> > 
> > Attached is the full patch that changes pg_walfile_name_offset() and
> > pg_walfile_name().  There is no need for doc changes.
> 
> I think this needs to add tests "documenting" the changed behaviour and
> perhaps also for a few other edge cases.  You could e.g. test
>   SELECT * FROM pg_walfile_name_offset('0/0')
> which today returns bogus values, and which is independent of the wal segment
> size.
> 
> And with
> SELECT setting::int8 AS segment_size FROM pg_settings WHERE name = 'wal_segment_size' \gset
> you can test real things too, e.g.:
> SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size),
pg_split_walfile_name(file_name);
> SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size + 1),
pg_split_walfile_name(file_name);
> SELECT segment_number, file_offset = :segment_size - 1 FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size -
1),pg_split_walfile_name(file_name);
 

Sure, I have added these tests.

FYI, pg_walfile_name_offset() has this C comment, which I have removed
in this patch;

    * Note that a location exactly at a segment boundary is taken to be in
    * the previous segment.  This is usually the right thing, since the
    * expected usage is to determine which xlog file(s) are ready to archive.

There is also a documentation mention of this behavior:

    When the given write-ahead log location is exactly at a write-ahead log file boundary, both
    these functions return the name of the preceding write-ahead log file.
    This is usually the desired behavior for managing write-ahead log archiving
    behavior, since the preceding file is the last one that currently
    needs to be archived.

After seeing the doc mention, I started digging into the history of this
feature.  It was originally called pg_current_xlogfile_offset() and
proposed in this email thread, which started on 2006-07-31:

    https://www.postgresql.org/message-id/flat/1154384790.3226.21.camel%40localhost.localdomain

In the initial patch by Simon Riggs, there was no "previous segment
file" behavior, just a simple filename/offset calculation.

This was applied on 2006-08-06 with this commit:

    commit 704ddaaa09
    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Date:   Sun Aug 6 03:53:44 2006 +0000
    
        Add support for forcing a switch to a new xlog file; cause such a switch
        to happen automatically during pg_stop_backup().  Add some functions for
        interrogating the current xlog insertion point and for easily extracting
        WAL filenames from the hex WAL locations displayed by pg_stop_backup
        and friends.  Simon Riggs with some editorialization by Tom Lane.

and the email of the commit said:

    https://www.postgresql.org/message-id/18457.1154836638%40sss.pgh.pa.us

    I also made the new user-level functions a bit more orthogonal, so
    that filenames could be extracted from the existing functions like
    pg_stop_backup.

There is later talk about returning last write pointer vs. the current
insert pointer, and having it match what is returned by pg_stop_backup():

    https://www.postgresql.org/message-id/1155124565.2368.95.camel%40localhost.localdomain
    
    Methinks it should be the Write pointer all of the time, since I can't
    think of a valid reason for wanting to know where the Insert pointer is
    *before* we've written to the xlog file. Having it be the Insert pointer
    could lead to some errors.

and I suspect that it was the desire to return the last write pointer
that caused the function to return the previous segment on a boundary
offset. This was intended to simplify log shipping implementations, I
think.

The function eventually was renamed in the xlog-to-wal renaming and moved
from xlog.c to xlogfuncs.c.  This thread in 2022 mentioned the
inconsistency for 0/0, but didn't seem to talk about the inconsistency
of offset vs file name:

    https://www.postgresql.org/message-id/flat/20220204225057.GA1535307%40nathanxps13#d964275c9540d8395e138efc0a75f7e8

and it concluded with:

    Yes, its the deliberate choice of design, or a kind of
    questionable-but-unoverturnable decision.  I think there are many
    external tools conscious of this behavior.

However, with the report about the inconsistency, the attached patch
fixes the behavior and removes the documentation about the odd behavior.
This will need to be mentioned as an incompatibility in the major
version release notes.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: How to solve the problem of one backend process crashing and causing other processes to restart?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Is this a problem in GenericXLogFinish()?