Обсуждение: heapgetpage() and ->takenDuringRecovery

Поиск
Список
Период
Сортировка

heapgetpage() and ->takenDuringRecovery

От
Andres Freund
Дата:
Hi,

I am currently playing around with Robert's suggestion to get rid of
changeset extraction's reusage of SnapshotData fields (basically that
xip contains committed, not uncommited transactions) by using NodeTag
similar to many other (families of) structs.

While reading around which references to SnapshotData's members exist, I
once more came about the following tidbit in heapgetpage():/* * If the all-visible flag indicates that all tuples on
thepage are * visible to everyone, we can skip the per-tuple visibility tests. * * Note: In hot standby, a tuple that's
alreadyvisible to all * transactions in the master might still be invisible to a read-only * transaction in the
standby.We partly handle this problem by tracking * the minimum xmin of visible tuples as the cut-off XID while marking
a* page all-visible on master and WAL log that along with the visibility * map SET operation. In hot standby, we wait
for(or abort) all * transactions that can potentially may not see one or more tuples on the * page. That's how
index-onlyscans work fine in hot standby. A crucial * difference between index-only scans and heap scans is that the *
index-onlyscan completely relies on the visibility map where as heap * scan looks at the page-level PD_ALL_VISIBLE
flag.We are not sure if * the page-level flag can be trusted in the same way, because it might * get propagated somehow
withoutbeing explicitly WAL-logged, e.g. via a * full page write. Until we can prove that beyond doubt, let's check
each* tuple for visibility the hard way. */all_visible = PageIsAllVisible(dp) && !snapshot->takenDuringRecovery;
 

I don't think this is neccessary >= 9.2. The are two only "interestings" place
where PD_ALL_VISIBLE is set:
a) lazy_vacuum_page() where a xl_heap_clean is logged *before*  PD_ALL_VISIBLE/the vm is touched and that causes
recovery conflicts. The heap page is locked for cleanup at that point. As the  logging of xl_heap_clean sets the page's
LSNthere's no way the page  can appear on the standby too early.
 
b) empty pages in lazy_scan_heap(). If they always were empty, there's  no need for conflicts. The only other way I can
seeto end up there  is a previous heap_page_prune() that repaired fragmentation. But that  logs a WAL record with
conflictinformation.
 

So, we could just remove this?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: heapgetpage() and ->takenDuringRecovery

От
Robert Haas
Дата:
On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> While reading around which references to SnapshotData's members exist, I
> once more came about the following tidbit in heapgetpage():
>         /*
>          * If the all-visible flag indicates that all tuples on the page are
>          * visible to everyone, we can skip the per-tuple visibility tests.
>          *
>          * Note: In hot standby, a tuple that's already visible to all
>          * transactions in the master might still be invisible to a read-only
>          * transaction in the standby. We partly handle this problem by tracking
>          * the minimum xmin of visible tuples as the cut-off XID while marking a
>          * page all-visible on master and WAL log that along with the visibility
>          * map SET operation. In hot standby, we wait for (or abort) all
>          * transactions that can potentially may not see one or more tuples on the
>          * page. That's how index-only scans work fine in hot standby. A crucial
>          * difference between index-only scans and heap scans is that the
>          * index-only scan completely relies on the visibility map where as heap
>          * scan looks at the page-level PD_ALL_VISIBLE flag. We are not sure if
>          * the page-level flag can be trusted in the same way, because it might
>          * get propagated somehow without being explicitly WAL-logged, e.g. via a
>          * full page write. Until we can prove that beyond doubt, let's check each
>          * tuple for visibility the hard way.
>          */
>         all_visible = PageIsAllVisible(dp) && !snapshot->takenDuringRecovery;
>
> I don't think this is neccessary >= 9.2. The are two only "interestings" place
> where PD_ALL_VISIBLE is set:
> a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
>    PD_ALL_VISIBLE/the vm is touched and that causes recovery
>    conflicts. The heap page is locked for cleanup at that point. As the
>    logging of xl_heap_clean sets the page's LSN there's no way the page
>    can appear on the standby too early.
> b) empty pages in lazy_scan_heap(). If they always were empty, there's
>    no need for conflicts. The only other way I can see to end up there
>    is a previous heap_page_prune() that repaired fragmentation. But that
>    logs a WAL record with conflict information.

I don't think there's any reason to believe that lazy_scan_heap() can
only hit pages that are empty or have just been defragged.  Suppose
that there's a tuple on the page which was recently inserted; the
inserting transaction has committed but there are some backends that
still have older snapshots.  The page won't be marked all-visible
because it isn't.  Now, eventually those older snapshots will go away,
and sometime after that the relation will get vacuumed again, and
we'll once again look the page.  But this time we notice that it is
all-visible, and mark it so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: heapgetpage() and ->takenDuringRecovery

От
Andres Freund
Дата:
On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > While reading around which references to SnapshotData's members exist, I
> > once more came about the following tidbit in heapgetpage():
> >         /*
> >          * If the all-visible flag indicates that all tuples on the page are
> >          * visible to everyone, we can skip the per-tuple visibility tests.
> >          *
> >          * Note: In hot standby, a tuple that's already visible to all
> >          * transactions in the master might still be invisible to a read-only
> >          * transaction in the standby. We partly handle this problem by tracking
> >          * the minimum xmin of visible tuples as the cut-off XID while marking a
> >          * page all-visible on master and WAL log that along with the visibility
> >          * map SET operation. In hot standby, we wait for (or abort) all
> >          * transactions that can potentially may not see one or more tuples on the
> >          * page. That's how index-only scans work fine in hot standby. A crucial
> >          * difference between index-only scans and heap scans is that the
> >          * index-only scan completely relies on the visibility map where as heap
> >          * scan looks at the page-level PD_ALL_VISIBLE flag. We are not sure if
> >          * the page-level flag can be trusted in the same way, because it might
> >          * get propagated somehow without being explicitly WAL-logged, e.g. via a
> >          * full page write. Until we can prove that beyond doubt, let's check each
> >          * tuple for visibility the hard way.
> >          */
> >         all_visible = PageIsAllVisible(dp) && !snapshot->takenDuringRecovery;
> >
> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
> > where PD_ALL_VISIBLE is set:
> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
> >    PD_ALL_VISIBLE/the vm is touched and that causes recovery
> >    conflicts. The heap page is locked for cleanup at that point. As the
> >    logging of xl_heap_clean sets the page's LSN there's no way the page
> >    can appear on the standby too early.
> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
> >    no need for conflicts. The only other way I can see to end up there
> >    is a previous heap_page_prune() that repaired fragmentation. But that
> >    logs a WAL record with conflict information.
> 
> I don't think there's any reason to believe that lazy_scan_heap() can
> only hit pages that are empty or have just been defragged.  Suppose
> that there's a tuple on the page which was recently inserted; the
> inserting transaction has committed but there are some backends that
> still have older snapshots.  The page won't be marked all-visible
> because it isn't.  Now, eventually those older snapshots will go away,
> and sometime after that the relation will get vacuumed again, and
> we'll once again look the page.  But this time we notice that it is
> all-visible, and mark it so.

Hm, right, that can happen. How about adding a LSN interlock in
visibilitymap_set() for those cases as well, not just for checksums?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: heapgetpage() and ->takenDuringRecovery

От
Robert Haas
Дата:
On Mon, Mar 3, 2014 at 7:07 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I don't think there's any reason to believe that lazy_scan_heap() can
>> only hit pages that are empty or have just been defragged.  Suppose
>> that there's a tuple on the page which was recently inserted; the
>> inserting transaction has committed but there are some backends that
>> still have older snapshots.  The page won't be marked all-visible
>> because it isn't.  Now, eventually those older snapshots will go away,
>> and sometime after that the relation will get vacuumed again, and
>> we'll once again look the page.  But this time we notice that it is
>> all-visible, and mark it so.
>
> Hm, right, that can happen. How about adding a LSN interlock in
> visibilitymap_set() for those cases as well, not just for checksums?

Well, if I'm correctly understanding what you're proposing, that would
involve emitting an FPI for each page when we vacuum the
newly-inserted portion of an insert-only table.  That's been
repeatedly proposed in the past, but I've opposed it on the grounds
that it makes vacuum much more expensive in such cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: heapgetpage() and ->takenDuringRecovery

От
Andres Freund
Дата:
On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
> > where PD_ALL_VISIBLE is set:
> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
> >    PD_ALL_VISIBLE/the vm is touched and that causes recovery
> >    conflicts. The heap page is locked for cleanup at that point. As the
> >    logging of xl_heap_clean sets the page's LSN there's no way the page
> >    can appear on the standby too early.
> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
> >    no need for conflicts. The only other way I can see to end up there
> >    is a previous heap_page_prune() that repaired fragmentation. But that
> >    logs a WAL record with conflict information.
> 
> I don't think there's any reason to believe that lazy_scan_heap() can
> only hit pages that are empty or have just been defragged.  Suppose
> that there's a tuple on the page which was recently inserted; the
> inserting transaction has committed but there are some backends that
> still have older snapshots.  The page won't be marked all-visible
> because it isn't.  Now, eventually those older snapshots will go away,
> and sometime after that the relation will get vacuumed again, and
> we'll once again look the page.  But this time we notice that it is
> all-visible, and mark it so.

Right now I am missing how this isn't an actual correctness problem
after a crash. Without an LSN interlock we could crash *after* the heap
page has been written out, but *before* the vm WAL record has been
flushed to disk. Combined with synchronous_commit=off there could be
transactions that appeared as safely committed for vacuum (i.e. are
below GetOldestXmin()), but which are actually aborted after the
commit.
Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
but that doesn't work here.

Am I missing something?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: heapgetpage() and ->takenDuringRecovery

От
Robert Haas
Дата:
On Mon, Mar 3, 2014 at 8:33 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
>> On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > I don't think this is neccessary >= 9.2. The are two only "interestings" place
>> > where PD_ALL_VISIBLE is set:
>> > a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
>> >    PD_ALL_VISIBLE/the vm is touched and that causes recovery
>> >    conflicts. The heap page is locked for cleanup at that point. As the
>> >    logging of xl_heap_clean sets the page's LSN there's no way the page
>> >    can appear on the standby too early.
>> > b) empty pages in lazy_scan_heap(). If they always were empty, there's
>> >    no need for conflicts. The only other way I can see to end up there
>> >    is a previous heap_page_prune() that repaired fragmentation. But that
>> >    logs a WAL record with conflict information.
>>
>> I don't think there's any reason to believe that lazy_scan_heap() can
>> only hit pages that are empty or have just been defragged.  Suppose
>> that there's a tuple on the page which was recently inserted; the
>> inserting transaction has committed but there are some backends that
>> still have older snapshots.  The page won't be marked all-visible
>> because it isn't.  Now, eventually those older snapshots will go away,
>> and sometime after that the relation will get vacuumed again, and
>> we'll once again look the page.  But this time we notice that it is
>> all-visible, and mark it so.
>
> Right now I am missing how this isn't an actual correctness problem
> after a crash. Without an LSN interlock we could crash *after* the heap
> page has been written out, but *before* the vm WAL record has been
> flushed to disk.

Yes.  In that case, the PD_ALL_VISIBLE bit will be set on the page,
but the corresponding visibility map bit will be unset.

> Combined with synchronous_commit=off there could be
> transactions that appeared as safely committed for vacuum (i.e. are
> below GetOldestXmin()), but which are actually aborted after the
> commit.
> Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
> but that doesn't work here.

Well, we'd better not try to mark a page all-visible if it's only
all-visible on the assumption that unwritten xlog will be successfully
flushed to disk.  But lazy_scan_heap() has code that only regards the
tuple as all-visible once the tuple is hinted committed, and there's
code elsewhere to keep hint bits from being set too early.  And
heap_page_is_all_visible() follows the same pattern.  So I think it's
OK, but maybe you see something I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company