Обсуждение: RE: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

Поиск
Список
Период
Сортировка

RE: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane

[snip]

> 
> regression=# vacuum foo;
> NOTICE:  FlushRelationBuffers(foo, 0): block 0 is dirty (private 
> 1, global 1)
> VACUUM
> 
> This is being caused by the buffer manager changes I recently made
> to avoid doing unnecessary writes at commit.  After the DELETE, foo's
> buffer is written to disk, but at that point its (single) tuple is
> marked with an uncommitted xmax transaction.  The SELECT verifies the
> tuple is dead and changes its on-row status to xmax-committed, but
> *that change is not forced to disk* (the rationale being that even if
> we crashed without writing the change, it could be done again later).
> Now VACUUM comes along, finds no live tuples, and decides to truncate
> the relation to zero blocks.  During the truncation,
> FlushRelationBuffers sees that the buffer it's flushing is still marked
> dirty, and hence emits the above notice.
>

This means vacuum doesn't necessarily flush all dirty buffers of
the target table. Doesn't this break the assumption of pg_upgrade ? 
> 
> Comments anyone?
>

Some changes around bufmgr seems to be needed.
1) Provide a function which could flush all dirty buffers   and vacuum calls the function for example.
or
2) SetBufferCommitInfoNeedsSave() sets SharedBuffer   Changed and BufferDirtiedByMe. 
and/or
3) Flush dirty buffers even in case of abort.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Now VACUUM comes along, finds no live tuples, and decides to truncate
>> the relation to zero blocks.  During the truncation,
>> FlushRelationBuffers sees that the buffer it's flushing is still marked
>> dirty, and hence emits the above notice.

> This means vacuum doesn't necessarily flush all dirty buffers of
> the target table. Doesn't this break the assumption of pg_upgrade ?

No, because it does still flush the buffer.  It's only emitting a
warning, because it thinks this condition suggests a bug in VACUUM.
But with the way bufmgr behaves now, the condition is actually fairly
normal, and so the warning is no longer of any value.
        regards, tom lane


RE: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Now VACUUM comes along, finds no live tuples, and decides to truncate
> >> the relation to zero blocks.  During the truncation,
> >> FlushRelationBuffers sees that the buffer it's flushing is still marked
> >> dirty, and hence emits the above notice.
>
> > This means vacuum doesn't necessarily flush all dirty buffers of
> > the target table. Doesn't this break the assumption of pg_upgrade ?
>
> No, because it does still flush the buffer.

Yes FlushRelationBuffers notices and flushes dirty buffers >=
the specified block. But doesn't it notice dirty buffers < the
specified block ? Or does vacuum flush all pages < the
specified block while processing ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>>>> This means vacuum doesn't necessarily flush all dirty buffers of
>>>> the target table. Doesn't this break the assumption of pg_upgrade ?
>> 
>> No, because it does still flush the buffer.

> Yes FlushRelationBuffers notices and flushes dirty buffers >=
> the specified block. But doesn't it notice dirty buffers < the
> specified block ? Or does vacuum flush all pages < the
> specified block while processing ?

Oh, I see what you mean: even after a VACUUM, it's possible for
there to be unwritten dirty buffers for a relation, containing
either changes entered by an aborted transaction, or updates of
on-row status values made by non-VACUUM transactions.  Hmm.
It's barely possible that the second case could break pg_upgrade,
if something before VACUUM had updated the on-row status values
in a page and then VACUUM itself had no reason to dirty the page.
If those status values never get written then pg_upgrade fails.

Maybe it would be a good idea for VACUUM to force out all dirty
pages for the relation even when theu're only dirty because of
on-row status updates.  Normally we wouldn't care about risking
losing those updates, but for pg_upgrade we would.

I was about to change FlushRelationBuffers anyway to get rid of
the bogus warning message.  What I propose to do is give it two
behaviors:
(1) write out dirty buffers at or beyond the specified block,   but don't remove buffers from pool; or
(2) remove buffers at or beyond the specified block from pool,   after writing them out if dirty.

VACUUM should apply case (2) beginning at the proposed truncation point,
and then apply case (1) starting at block 0.

Sound good?
        regards, tom lane


RE: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

[snip]
> Maybe it would be a good idea for VACUUM to force out all dirty
> pages for the relation even when theu're only dirty because of
> on-row status updates.  Normally we wouldn't care about risking
> losing those updates, but for pg_upgrade we would.
> 
> I was about to change FlushRelationBuffers anyway to get rid of
> the bogus warning message.  What I propose to do is give it two
> behaviors:
> (1) write out dirty buffers at or beyond the specified block,
>     but don't remove buffers from pool; or
> (2) remove buffers at or beyond the specified block from pool,
>     after writing them out if dirty.
> 
> VACUUM should apply case (2) beginning at the proposed truncation point,
> and then apply case (1) starting at block 0.
> 
> Sound good?
>

Agreed.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: Actually it's a bufmgr issue (was Re: Another pg_listener issue)

От
Tom Lane
Дата:
>> I was about to change FlushRelationBuffers anyway to get rid of
>> the bogus warning message.  What I propose to do is give it two
>> behaviors:
>> (1) write out dirty buffers at or beyond the specified block,
>> but don't remove buffers from pool; or
>> (2) remove buffers at or beyond the specified block from pool,
>> after writing them out if dirty.
>> 
>> VACUUM should apply case (2) beginning at the proposed truncation point,
>> and then apply case (1) starting at block 0.
>> 
>> Sound good?

> Agreed.

OK, I've committed a fix for this.  After looking at the uses of
FlushRelationBuffers, I gave it just one behavior: flush *all* dirty
buffers of the relation, and remove from the cache those that are
at or beyond the specified block number.  This allows VACUUM's needs
to be met in one buffer-cache scan instead of two.

I also cleaned up ReleaseRelationBuffers, which should have but did
not remove the relation's buffers from the cache.  This left us with
"valid" buffers for a deleted relation after any DROP TABLE.  No
known bug there, but clearly trouble waiting to happen.  Likewise
for DropBuffers (same thing for a whole database).

Finally, the "removal" of the deleted buffers in these routines
consisted of calling BufTableDelete(), which removes the buffer from the
shared-buffer hashtable, so it will not be found by a lookup for a
specific block --- but the various routines that scan the whole
shared-buffer array would still think the buffer belongs to its former
relation!  That can't be good either, so I made BufTableDelete() clear
the tag field for the buffer.
        regards, tom lane