Обсуждение: cleaning up a few CLOG-related things

Поиск
Список
Период
Сортировка

cleaning up a few CLOG-related things

От
Robert Haas
Дата:
I attach a series of proposed patches to slightly improve some minor
things related to the CLOG code.

0001 - Always call StartupCLOG() just after initializing
ShmemVariableCache. Right now, the hot_standby=off code path does this
at end of recovery, and the hot_standby=on code path does it at the
beginning of recovery. It's better to do this in only one place
because (a) it's simpler, (b) StartupCLOG() is trivial so trying to
postpone it in the hot_standby=off case has no value, and (c) it
allows for 0002 and therefore 0003, which make things even simpler.

0002 - In clog_redo(), don't set XactCtl->shared->latest_page_number.
The value that is being set here is actually the oldest page we're not
truncating, not the newest page that exists, so it's a bogus value
(except when we're truncating all but one page). The reason it's like
this now is to avoid having SimpleLruTruncate() see an uninitialized
value that might trip a sanity check, but after 0001 that won't be
possible, so we can just let the sanity check do its thing.

0003 - In TrimCLOG(), don't reset XactCtl->shared->latest_page_number.
After we stop doing 0002 we don't need to do this either, because the
only thing this can be doing for us is correcting the error introduced
by the code which 0002 removes. Relying on the results of replaying
(authoritative) CLOG/EXTEND records seems better than relying on our
(approximate) value of nextXid at end of recovery.

0004 - In StartupCLOG(), correct an off-by-one error. Currently, if
the nextXid is exactly a multiple of the number of CLOG entries that
fit on a page, then the value we compute for
XactCtl->shared->latest_page_number is higher than it should be by 1.
That's because nextXid represents the first XID that hasn't yet been
allocated, not the last one that gets allocated. So, the CLOG page
gets created when nextXid advances from the first value on the page to
the second value on the page, not when it advances from the last value
on the previous page to the first value on the current page.

Note that all of 0001-0003 result in a net removal of code. 0001 comes
out to more lines total because of the comment changes, but fewer
executable lines.

I don't plan to back-patch any of this because, AFAICS, an incorrect
value of XactCtl->shared->latest_page_number has no real consequences.
The SLRU code uses latest_page_number for just two purposes. First,
the latest page is never evicted; but that's just a question of
performance, not correctness, and the performance impact is small.
Second, the sanity check in SimpleLruTruncate() uses it. The present
code can make the value substantially inaccurate during recovery, but
only in a way that can make the sanity check pass rather than failing,
so it's not going to really bite anybody except perhaps if they have a
corrupted cluster where they would have liked the sanity check to
catch some problem. When not in recovery, the value can be off by at
most one. I am not sure whether there's a theoretical risk of this
making SimpleLruTruncate()'s sanity check fail when it should have
passed, but even if there is the chances must be extremely remote.

Some of the other SLRUs have similar issues as a result of
copy-and-paste work over the years. I plan to look at tidying that
stuff up, too. However, I wanted to post (and probably commit) these
patches first, partly to get some feedback, and also because all the
cases are a little different and I want to make sure to do a proper
analysis of each one.

Any review very much appreciated.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: cleaning up a few CLOG-related things

От
Heikki Linnakangas
Дата:
On 25/01/2021 18:56, Robert Haas wrote:
> I attach a series of proposed patches to slightly improve some minor
> things related to the CLOG code.
> 
> [patches 0001 - 0003]

Makes sense.

> 0004 - In StartupCLOG(), correct an off-by-one error. Currently, if
> the nextXid is exactly a multiple of the number of CLOG entries that
> fit on a page, then the value we compute for
> XactCtl->shared->latest_page_number is higher than it should be by 1.
> That's because nextXid represents the first XID that hasn't yet been
> allocated, not the last one that gets allocated.

Yes.

> So, the CLOG page gets created when nextXid advances from the first
> value on the page to the second value on the page, not when it
> advances from the last value on the previous page to the first value
> on the current page.
Yes. It took me a moment to understand that explanation, though. I'd 
phrase it something like "nextXid is the next XID that will be used, but 
we want to set latest_page_number according to the last XID that's 
already been used. So retreat by one."

Having a separate FullTransactionIdToLatestPageNumber() function for 
this seems like overkill to me.

> Some of the other SLRUs have similar issues as a result of
> copy-and-paste work over the years. I plan to look at tidying that
> stuff up, too. However, I wanted to post (and probably commit) these
> patches first, partly to get some feedback, and also because all the
> cases are a little different and I want to make sure to do a proper
> analysis of each one.

Yeah, multixact seems similar at least.

- Heikki



Re: cleaning up a few CLOG-related things

От
Robert Haas
Дата:
On Mon, Jan 25, 2021 at 2:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > [patches 0001 - 0003]
>
> Makes sense.

Great. I committed the first one and will proceed with those as well.

> > So, the CLOG page gets created when nextXid advances from the first
> > value on the page to the second value on the page, not when it
> > advances from the last value on the previous page to the first value
> > on the current page.
> Yes. It took me a moment to understand that explanation, though. I'd
> phrase it something like "nextXid is the next XID that will be used, but
> we want to set latest_page_number according to the last XID that's
> already been used. So retreat by one."

OK, updated the patch to use that language for the comment.

> Having a separate FullTransactionIdToLatestPageNumber() function for
> this seems like overkill to me.

I initially thought so too, but it turned out to be pretty useful for
writing debugging cross-checks and things, so I'm inclined to keep it
even though I'm not at present proposing to commit any such debugging
cross-checks. For example I tried making the main redo loop check
whether XactCtl->shared->latest_page_number ==
FullTransactionIdToLatestPageNumber(nextXid) which turned out to be
super-helpful in understanding things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: cleaning up a few CLOG-related things

От
Noah Misch
Дата:
On Wed, Jan 27, 2021 at 12:35:30PM -0500, Robert Haas wrote:
> On Mon, Jan 25, 2021 at 2:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Having a separate FullTransactionIdToLatestPageNumber() function for
> > this seems like overkill to me.
> 
> I initially thought so too, but it turned out to be pretty useful for
> writing debugging cross-checks and things, so I'm inclined to keep it
> even though I'm not at present proposing to commit any such debugging
> cross-checks. For example I tried making the main redo loop check
> whether XactCtl->shared->latest_page_number ==
> FullTransactionIdToLatestPageNumber(nextXid) which turned out to be
> super-helpful in understanding things.

> +/*
> + * Based on ShmemVariableCache->nextXid, compute the latest CLOG page that
> + * is expected to exist.
> + */
> +static int
> +FullTransactionIdToLatestPageNumber(FullTransactionId nextXid)
> +{
> +    /*
> +     * nextXid is the next XID that will be used, but we want to set
> +     * latest_page_number according to the last XID that's already been used.
> +     * So retreat by one. See also GetNewTransactionId().
> +     */
> +    FullTransactionIdRetreat(&nextXid);
> +    return TransactionIdToPage(XidFromFullTransactionId(nextXid));
> +}

I don't mind the arguably-overkill function.  I'd probably have named it
FullTransactionIdPredecessorToPage(), to focus on its behavior as opposed to
its caller's behavior, but static function naming isn't a weighty matter.
Overall, the patch looks fine.  If nextXid is the first on a page, the next
GetNewTransactionId() -> ExtendCLOG() -> ZeroCLOGPage() -> SimpleLruZeroPage()
is responsible for updating latest_page_number.