Обсуждение: Buffer access rules, and a probable bug

Поиск

Список

Период

Сортировка

Buffer access rules, and a probable bug

От

Tom Lane

Дата:

03 июля 2001 г., 00:41:15

I have been making some notes about the rules for accessing shared disk
buffers, since they aren't spelled out anywhere now AFAIK.  In process
I found what seems to be a nasty bug in the code that tries to build
btree indexes that include already-dead tuples.  (If memory serves,
Hiroshi added that code awhile back to help suppress the "heap tuples
!= index tuples" complaint from VACUUM.)

Would people look this over and see if they agree with my deductions?
        regards, tom lane


Notes about shared buffer access rules
--------------------------------------

There are two separate access control mechanisms for shared disk buffers:
reference counts (a/k/a pin counts) and buffer locks.  (Actually, there's
a third level of access control: one must hold the appropriate kind of
lock on a relation before one can legally access any page belonging to
the relation.  Relation-level locks are not discussed here.)

Pins: one must "hold a pin on" a buffer (increment its reference count)
before being allowed to do anything at all with it.  An unpinned buffer is
subject to being reclaimed and reused for a different page at any instant,
so touching it is unsafe.  Typically a pin is acquired by ReadBuffer and
released by WriteBuffer (if one modified the page) or ReleaseBuffer (if not).
It is OK and indeed common for a single backend to pin a page more than
once concurrently; the buffer manager handles this efficiently.  It is
considered OK to hold a pin for long intervals --- for example, sequential
scans hold a pin on the current page until done processing all the tuples
on the page, which could be quite a while if the scan is the outer scan of
a join.  Similarly, btree index scans hold a pin on the current index
page.  This is OK because there is actually no operation that waits for a
page's pin count to drop to zero.  (Anything that might need to do such a
wait is instead handled by waiting to obtain the relation-level lock,
which is why you'd better hold one first.)  Pins may not be held across
transaction boundaries, however.

Buffer locks: there are two kinds of buffer locks, shared and exclusive,
which act just as you'd expect: multiple backends can hold shared locks
on the same buffer, but an exclusive lock prevents anyone else from
holding either shared or exclusive lock.  (These can alternatively be
called READ and WRITE locks.)  These locks are relatively short term:
they should not be held for long.  They are implemented as per-buffer
spinlocks, so another backend trying to acquire a competing lock will
spin as long as you hold yours!  Buffer locks are acquired and released by
LockBuffer().  It will *not* work for a single backend to try to acquire
multiple locks on the same buffer.  One must pin a buffer before trying
to lock it.

Buffer access rules:

1. To scan a page for tuples, one must hold a pin and either shared or
exclusive lock.  To examine the commit status (XIDs and status bits) of
a tuple in a shared buffer, one must likewise hold a pin and either shared
or exclusive lock.

2. Once one has determined that a tuple is interesting (visible to the
current transaction) one may drop the buffer lock, yet continue to access
the tuple's data for as long as one holds the buffer pin.  This is what is
typically done by heap scans, since the tuple returned by heap_fetch
contains a pointer to tuple data in the shared buffer.  Therefore the
tuple cannot go away while the pin is held (see rule #5).  Its state could
change, but that is assumed not to matter after the initial determination
of visibility is made.

3. To add a tuple or change the xmin/xmax fields of an existing tuple,
one must hold a pin and an exclusive lock on the containing buffer.
This ensures that no one else might see a partially-updated state of the
tuple.

4. It is considered OK to update tuple commit status bits (ie, OR the
values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
pin on a buffer.  This is OK because another backend looking at the tuple
at about the same time would OR the same bits into the field, so there
is little or no risk of conflicting update; what's more, if there did
manage to be a conflict it would merely mean that one bit-update would
be lost and need to be done again later.

5. To physically remove a tuple or compact free space on a page, one
must hold a pin and an exclusive lock, *and* observe while holding the
exclusive lock that the buffer's shared reference count is one (ie,
no other backend holds a pin).  If these conditions are met then no other
backend can perform a page scan until the exclusive lock is dropped, and
no other backend can be holding a reference to an existing tuple that it
might expect to examine again.  Note that another backend might pin the
buffer (increment the refcount) while one is performing the cleanup, but
it won't be able to to actually examine the page until it acquires shared
or exclusive lock.


Currently, the only operation that removes tuples or compacts free space
is VACUUM.  It does not have to implement rule #5 directly, because it
instead acquires exclusive lock at the relation level, which ensures
indirectly that no one else is accessing tuples of the relation at all.
To implement concurrent VACUUM we will need to make it obey rule #5.


I believe that nbtree.c's btbuild() code is currently in violation of
these rules, because it calls HeapTupleSatisfiesNow() while holding a
pin but no lock on the containing buffer.  Not only does this risk
returning a wrong answer if it sees an intermediate state of the tuple
xmin/xmax fields, but HeapTupleSatisfiesNow() may try to update the
tuple's infomask.  This could produce a wrong state of the infomask if
there is a concurrent change of the infomask by an exclusive-lock holder.
(Even if that catastrophe does not happen, SetBufferCommitInfoNeedsSave
is not called as it should be when the status bits change.)

I am also disturbed that several places in heapam.c call
HeapTupleSatisfiesUpdate without checking for infomask change and calling
SetBufferCommitInfoNeedsSave.  In most paths through the code, it seems
the buffer will get marked dirty anyway, but wouldn't it be better to make
this check?

Re: Buffer access rules, and a probable bug

От

Hiroshi Inoue

Дата:

03 июля 2001 г., 02:27:05

Tom Lane wrote:
> 
> I have been making some notes about the rules for accessing shared disk
> buffers, since they aren't spelled out anywhere now AFAIK.  In process
> I found what seems to be a nasty bug in the code that tries to build
> btree indexes that include already-dead tuples.  (If memory serves,
> Hiroshi added that code awhile back to help suppress the "heap tuples
> != index tuples" complaint from VACUUM.)
> 

[snip]

> 
> I believe that nbtree.c's btbuild() code is currently in violation of
> these rules, because it calls HeapTupleSatisfiesNow() while holding a
> pin but no lock on the containing buffer.

OK, we had better avoid using heapam routines in btbuild() ? 

regards,

Hiroshi Inoue

Re: Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

03 июля 2001 г., 16:07:00

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> I believe that nbtree.c's btbuild() code is currently in violation of
>> these rules, because it calls HeapTupleSatisfiesNow() while holding a
>> pin but no lock on the containing buffer.

> OK, we had better avoid using heapam routines in btbuild() ? 

On further thought, btbuild is not that badly broken at the moment,
because CREATE INDEX acquires ShareLock on the relation, so there can be
no concurrent writers at the page level.  Still, it seems like it'd be a
good idea to do "LockBuffer(buffer, BUFFER_LOCK_SHARE)" here, and
probably also to invoke HeapTupleSatisfiesNow() via the
HeapTupleSatisfies() macro so that infomask update is checked for.
Vadim, what do you think?
        regards, tom lane

RE: Re: Buffer access rules, and a probable bug

От

"Mikheev, Vadim"

Дата:

03 июля 2001 г., 16:42:00

> On further thought, btbuild is not that badly broken at the moment,
> because CREATE INDEX acquires ShareLock on the relation, so
> there can be no concurrent writers at the page level. Still, it
> seems like it'd be a good idea to do "LockBuffer(buffer,
BUFFER_LOCK_SHARE)"
> here, and probably also to invoke HeapTupleSatisfiesNow() via the
> HeapTupleSatisfies() macro so that infomask update is checked for.
> Vadim, what do you think?

Looks like there is no drawback in locking buffer so let's lock it.

Vadim

Re: Buffer access rules, and a probable bug

От

ncm@zembu.com (Nathan Myers)

Дата:

03 июля 2001 г., 19:38:48

On Mon, Jul 02, 2001 at 09:40:25PM -0400, Tom Lane wrote:
> 
> 4. It is considered OK to update tuple commit status bits (ie, OR the
> values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
> HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
> pin on a buffer.  This is OK because another backend looking at the tuple
> at about the same time would OR the same bits into the field, so there
> is little or no risk of conflicting update; what's more, if there did
> manage to be a conflict it would merely mean that one bit-update would
> be lost and need to be done again later.

Without looking at the code, this seems mad.  Are you sure?

Nathan Myers
ncm@zembu.com

Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

03 июля 2001 г., 20:58:15

ncm@zembu.com (Nathan Myers) writes:
> On Mon, Jul 02, 2001 at 09:40:25PM -0400, Tom Lane wrote:
>> 4. It is considered OK to update tuple commit status bits (ie, OR the
>> values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
>> HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
>> pin on a buffer.  This is OK because another backend looking at the tuple
>> at about the same time would OR the same bits into the field, so there
>> is little or no risk of conflicting update; what's more, if there did
>> manage to be a conflict it would merely mean that one bit-update would
>> be lost and need to be done again later.

> Without looking at the code, this seems mad.  Are you sure?

Yes.  Those status bits aren't ground truth, only hints.  They cache the
results of looking up transaction status in pg_log; if they get dropped,
the only consequence is the next visitor to the tuple has to do the
lookup over again.

Changing any other bits in t_infomask requires exclusive lock, however.
        regards, tom lane

Re: Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

03 июля 2001 г., 21:54:02

Okay, on to the next concern.  I've been thinking some more about the
restrictions needed to make the world safe for concurrent VACUUM.
I previously said:

> 5. To physically remove a tuple or compact free space on a page, one
> must hold a pin and an exclusive lock, *and* observe while holding the
> exclusive lock that the buffer's shared reference count is one (ie,
> no other backend holds a pin).  If these conditions are met then no other
> backend can perform a page scan until the exclusive lock is dropped, and
> no other backend can be holding a reference to an existing tuple that it
> might expect to examine again.  Note that another backend might pin the
> buffer (increment the refcount) while one is performing the cleanup, but
> it won't be able to to actually examine the page until it acquires shared
> or exclusive lock.

This is OK when considering a page in isolation, but it does not get the
job done when one is deleting related index tuples and heap tuples.  It
seems to me that there *must* be some cross-page coupling to make that
work safely.  Otherwise you could have this scenario:

1. Indexscanning process visits an index tuple, decides to access the  corresponding heap tuple, drops its lock on the
indexbuffer page.
 

2. VACUUMing process visits the index buffer page and marks the index  tuple dead.  (It won't try to delete the tuple
yet,since it sees  the pin still held on the page by process #1, but it can acquire  exclusive lock and mark the tuple
deadanyway.)
 

3. VACUUMing process is the first to acquire pin and lock on the heap  buffer page.  It sees no other pin, so it
deletesthe tuple.
 

4. Indexscanning process finally acquires pin and lock on the heap page,  tries to access what is now a gone tuple.
Ooops. (Even if we  made that not an error condition, it could be worse: what if a  third process already reused the
linepointer for a new tuple?)
 

It does not help to postpone the actual cleaning of an index or heap
page until its own pin count drops to zero --- the problem here is that
an indexscanner has acquired a reference into a heap page from the
index, but does not yet hold a pin on the heap page to ensure that
the reference stays good.  So we can't just postpone the cleaning of
the index page till it has pin count zero, we have to make the related
heap page(s)' cleanup wait for that to happen too.

I can think of two ways of guaranteeing that this problem cannot happen.

One is for an indexscanning process to retain its shared lock on the
index page until it has acquired at least a pin on the heap page.
This is very bad for concurrency --- it means that we'd typically
be holding indexpage shared locks for the time needed to read in a
randomly-accessed disk page.  And it's very complicated, since we still
need all the other rules, plus the mechanism for postponing cleanups
until pin count goes to zero.  It'd cause considerable changes to the
index access method API, too.

The other is to forget about asynchronous cleaning, and instead have the
VACUUM process directly do the wait for pin count zero, then clean the
index page.  Then when it does the same for the heap page, we know for
sure there are no indexscanners in transit to the heap page.  This would
be logically a lot simpler, it seems to me.  Another advantage is that
we need only one WAL entry per cleaned page, not two (one for the
initial mark-dead step and one for the postponable compaction step),
and there's no need for an intermediate "gone but not forgotten" state
for index tuples.

We could implement this in pretty nearly the same way as the "mark for
cleanup" facility that you partially implemented awhile back:
essentially, the cleanup callback would send a signal or semaphore
increment to the waiting process, which would then try to acquire pin
and exclusive lock on the buffer.  If it succeeded in observing pin
count 1 with exclusive lock, it could proceed with cleanup, else loop
back and try again.  Eventually it'll get the lock.  (It might take
awhile, but for a background VACUUM I think that's OK.)

What I'm wondering is if you had any other intended use for "mark for
cleanup" than VACUUM.  The cheapest implementation would allow only
one process to be waiting for cleanup on a given buffer, which is OK
for VACUUM because we'll only allow one VACUUM at a time on a relation
anyway.  But if you had some other uses in mind, maybe the code needs
to support multiple waiters.

Comments?
        regards, tom lane

Re: Buffer access rules, and a probable bug

От

ncm@zembu.com (Nathan Myers)

Дата:

03 июля 2001 г., 22:31:03

On Tue, Jul 03, 2001 at 05:11:46PM -0400, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > On Mon, Jul 02, 2001 at 09:40:25PM -0400, Tom Lane wrote:
> >> 4. It is considered OK to update tuple commit status bits (ie, OR the
> >> values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or
> >> HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and
> >> pin on a buffer.  This is OK because another backend looking at the tuple
> >> at about the same time would OR the same bits into the field, so there
> >> is little or no risk of conflicting update; what's more, if there did
> >> manage to be a conflict it would merely mean that one bit-update would
> >> be lost and need to be done again later.
> 
> > Without looking at the code, this seems mad.  Are you sure?
> 
> Yes.  Those status bits aren't ground truth, only hints.  They cache the
> results of looking up transaction status in pg_log; if they get dropped,
> the only consequence is the next visitor to the tuple has to do the
> lookup over again.
> 
> Changing any other bits in t_infomask requires exclusive lock, however.

Hmm, look:
   A                  B

1   load t_infomask

2   or XMIN_INVALID

3                      lock

4                      load t_infomask

5                      or MOVED_IN

6                      store t_infomask

7                      unlock

8    store t_infomask

Here, backend B is a good citizen and locks while it makes its change.  
Backend A only means to touch the "hint" bits, but it ends up clobbering
the MOVED_IN bit too.

Also, as hints, would it be Bad(tm) if an attempt to clear one failed?
That seems possible as well.

Nathan Myers
ncm@zembu.com

Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

04 июля 2001 г., 02:29:48

ncm@zembu.com (Nathan Myers) writes:
> Here, backend B is a good citizen and locks while it makes its change.

No, backend B wasn't a good citizen: it should have been holding
exclusive lock on the buffer.

> Also, as hints, would it be Bad(tm) if an attempt to clear one failed?

Clearing hint bits is also an exclusive-lock-only operation.  Notice
I specified that *setting* them is the only case allowed to be done
with shared lock.
        regards, tom lane

RE: Re: Buffer access rules, and a probable bug

От

"Hiroshi Inoue"

Дата:

04 июля 2001 г., 09:22:17

> -----Original Message-----
> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]
> 
> > On further thought, btbuild is not that badly broken at the moment,
> > because CREATE INDEX acquires ShareLock on the relation, so
> > there can be no concurrent writers at the page level. Still, it
> > seems like it'd be a good idea to do "LockBuffer(buffer,
> BUFFER_LOCK_SHARE)"
> > here, and probably also to invoke HeapTupleSatisfiesNow() via the
> > HeapTupleSatisfies() macro so that infomask update is checked for.
> > Vadim, what do you think?
> 
> Looks like there is no drawback in locking buffer so let's lock it.
> 

OK I would fix it.
As for HeapTupleSatisfies() there seems to be another choise to
let HeapTupleSatisfiesAny() be equivalent to HeapTupleSatisfiesNow()
other than always returning true.

Comments ?

regards,
Hiroshi Inoue

Re: Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

04 июля 2001 г., 16:10:28

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> As for HeapTupleSatisfies() there seems to be another choise to
> let HeapTupleSatisfiesAny() be equivalent to HeapTupleSatisfiesNow()
> other than always returning true.

Wouldn't that break the other uses of SnapshotAny?  I'm not sure
it's what nbtree.c wants, either, because then the heap_getnext
call wouldn't return recently-dead tuples at all.
        regards, tom lane

Re: Re: Buffer access rules, and a probable bug

От

Hiroshi Inoue

Дата:

04 июля 2001 г., 22:42:17

Tom Lane wrote:
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > As for HeapTupleSatisfies() there seems to be another choise to
> > let HeapTupleSatisfiesAny() be equivalent to HeapTupleSatisfiesNow()
> > other than always returning true.
> 
> Wouldn't that break the other uses of SnapshotAny? 

In theory no because HeapTupleSatisfies...() only touches
hint bits. What I mean is to implement a new function
HeapTupleSatisfiesAny() as

bool
HeapTupleSatisfiesAny(HeapTupleHeader tuple)
{HeapTupleSatisfiesNow(tuple);return true;
}
.

> I'm not sure
> it's what nbtree.c wants, either, because then the heap_getnext
> call wouldn't return recently-dead tuples at all.
> 

nbtree.c has to see all(including dead) tuples and judge
if the tuples are alive, dead or removable via unified
time qualification.

regards,
Hiroshi Inoue

Re: Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

05 июля 2001 г., 00:05:32

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> What I mean is to implement a new function
> HeapTupleSatisfiesAny() as

> bool
> HeapTupleSatisfiesAny(HeapTupleHeader tuple)
> {
>     HeapTupleSatisfiesNow(tuple);
>     return true;
> }

Oh, I see: so that HeapTupleSatisfies would update the hint bits even
when called with snapshot = SnapShotAny.  Hmm.  This might be a good
idea on its own merits, but I don't think it simplifies nbtree.c at
all --- you'd still have to go through the full LockBuffer and hint
update procedure there.  (If the other transaction committed meanwhile,
the call from nbtree.c could try to update hint bits that hadn't been
updated during heap_fetch.)

BTW, I don't really think that this code belongs in nbtree.c at all.
If it lives there, then we need to duplicate the logic in each index
access method.  At some point we ought to fix the index build process
so that the loop that scans the source relation is outside the access-
method-specific code.
        regards, tom lane

Re: Re: Buffer access rules, and a probable bug

От

Bruce Momjian

Дата:

05 июля 2001 г., 02:19:45

> In theory no because HeapTupleSatisfies...() only touches
> hint bits. What I mean is to implement a new function
> HeapTupleSatisfiesAny() as

Can someone add comments to the Satisfies code to explain each
visibility function?  Some are documented and some are not.  Seems it
would be good to get this stuff in there for future coders.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: Buffer access rules, and a probable bug

От

Hiroshi Inoue

Дата:

05 июля 2001 г., 03:44:12

Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > What I mean is to implement a new function
> > HeapTupleSatisfiesAny() as
> 
> > bool
> > HeapTupleSatisfiesAny(HeapTupleHeader tuple)
> > {
> >       HeapTupleSatisfiesNow(tuple);
> >       return true;
> > }
> 
> Oh, I see: so that HeapTupleSatisfies would update the hint bits even
> when called with snapshot = SnapShotAny.  Hmm.  This might be a good
> idea on its own merits, but I don't think it simplifies nbtree.c at
> all --- you'd still have to go through the full LockBuffer and hint
> update procedure there.  (If the other transaction committed meanwhile,
> the call from nbtree.c could try to update hint bits that hadn't been
> updated during heap_fetch.)
> 

Dead(HEAP_XMAX_COMMITTED || HEAP_XMIN_INVALID) tuples never
revive. Live (not dead) tuples never die in Share Lock mode.
So I don't have to call HeapTupleSatisfies() again though I
seem to have to lock the buffer so as to see t_infomask and
t_xmax.

> BTW, I don't really think that this code belongs in nbtree.c at all.
> If it lives there, then we need to duplicate the logic in each index
> access method.  At some point we ought to fix the index build process
> so that the loop that scans the source relation is outside the access-
> method-specific code.
> 

Agreed. 

regards,
Hiroshi Inoue

Re: Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

05 июля 2001 г., 12:55:21

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Dead(HEAP_XMAX_COMMITTED || HEAP_XMIN_INVALID) tuples never
> revive. Live (not dead) tuples never die in Share Lock mode.

Hmm ... so you're relying on the ShareLock to ensure that the state of
the tuple can't change between when heap_fetch checks it and when
nbtree looks at it.

Okay, but put in some comments documenting what this stuff is doing
and why.
        regards, tom lane

RE: Re: Buffer access rules, and a probable bug

От

"Mikheev, Vadim"

Дата:

05 июля 2001 г., 18:47:02

> What I'm wondering is if you had any other intended use for "mark for
> cleanup" than VACUUM.  The cheapest implementation would allow only
> one process to be waiting for cleanup on a given buffer, which is OK
> for VACUUM because we'll only allow one VACUUM at a time on a relation
> anyway.  But if you had some other uses in mind, maybe the code needs
> to support multiple waiters.

I was going to use it for UNDO but it seems that UNDO w/o OSMGR is not
popular and OSMGR will require different approaches anyway, so -
do whatever you want.

Vadim

Re: Re: Buffer access rules, and a probable bug

От

Hiroshi Inoue

Дата:

06 июля 2001 г., 00:21:33

"Mikheev, Vadim" wrote:
> 
> > What I'm wondering is if you had any other intended use for "mark for
> > cleanup" than VACUUM.  The cheapest implementation would allow only
> > one process to be waiting for cleanup on a given buffer, which is OK
> > for VACUUM because we'll only allow one VACUUM at a time on a relation
> > anyway.  But if you had some other uses in mind, maybe the code needs
> > to support multiple waiters.
> 
> I was going to use it for UNDO but it seems that UNDO w/o OSMGR is not
> popular and OSMGR will require different approaches anyway, so -
> do whatever you want.
> 

How is UNDO now ?
I've wanted a partial rollback functionality for a long
time and I've never seen the practical solution other
than UNDO.

I'm thinking the following rstricted UNDO.

UNDO is never invoked for an entire rollback of a transaction.
The implicit savepoint at the beginning of a transaction isn't
needed. If there's no savepoints, UNDO is never invoked.
Especially UNDO is never invoked for commands outside the
transaction block. WAL logs before savepoints could be 
discarded at CheckPoint.

Comments ? 

regards,
Hiroshi Inoue

Re: Buffer access rules, and a probable bug

От

Bill Studenmund

Дата:

10 июля 2001 г., 19:32:20

Sorry for the delay.

On Tue, 3 Jul 2001, Tom Lane wrote:

> ncm@zembu.com (Nathan Myers) writes:
> 
> > Also, as hints, would it be Bad(tm) if an attempt to clear one failed?
> 
> Clearing hint bits is also an exclusive-lock-only operation.  Notice
> I specified that *setting* them is the only case allowed to be done
> with shared lock.

One problem though is that if you don't have a spin lock around the flag,
you can end up clearing it inadvertenty. i.e. two backends go to update
(different) bit flags. They each load the current value, and each set the
(different) bit they want to set. They then store the new value they each
have come up with. The second store will effectively clear the bit set in
the first store.

??

Take care,

Bill

Re: Buffer access rules, and a probable bug

От

Tom Lane

Дата:

10 июля 2001 г., 19:38:56

Bill Studenmund <wrstuden@zembu.com> writes:
> One problem though is that if you don't have a spin lock around the flag,
> you can end up clearing it inadvertenty. i.e. two backends go to update
> (different) bit flags. They each load the current value, and each set the
> (different) bit they want to set. They then store the new value they each
> have come up with. The second store will effectively clear the bit set in
> the first store.

Yes, this is the scenario we're talking about.  The point is that losing
the bit is harmless, in this particular situation, because it will be
set again the next time the tuple is visited.  It's only a hint.  That
being the case, we judge that avoiding the necessity for an exclusive
lock during read is well worth a (very infrequent) need to do an extra
pg_log lookup due to loss of a hint bit.
        regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Buffer access rules, and a probable bug