Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication
Дата
Msg-id CAD21AoDzgc8_MYrA5m1fyydomw_eVKtQiYh7sfDK4KEhdMsf_g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication  (Andres Freund <andres@anarazel.de>)
Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Thu, Apr 29, 2021 at 1:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 28, 2021 at 7:34 PM tanghy.fnst@fujitsu.com
> <tanghy.fnst@fujitsu.com> wrote:
> > >TRAP: FailedAssertion("!all_visible_according_to_vm || prunestate.all_visible", File: "vacuumlazy.c", Line: 1347,
PID:1274675) 
> >
> > BTW, in asynchronous mode, the same problem can also happen but in a low frequency.(I tried many times, but the
problemhappened only 2 times) 
> > As for synchronous mode, I found it seems easier to reproduce the problem with setting "autovacuum_naptime = 1".
> > But it still can't be 100% to reproduced it. (I tested it 5 times, 3 of them reproduced it.)

Thanks for reporting the issue!

>
> Is setting all_visible_according_to_vm false as below enough to avoid
> the assertion failure?
>
> diff --git a/src/backend/access/heap/vacuumlazy.c
> b/src/backend/access/heap/vacuumlazy.c
> index c3fc12d76c..76c17e063e 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1146,6 +1146,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
> *params, bool aggressive)
>              {
>                  ReleaseBuffer(vmbuffer);
>                  vmbuffer = InvalidBuffer;
> +                all_visible_according_to_vm = false;
>              }
>
>              /* Remove the collected garbage tuples from table and indexes */

Since we set all_visible_according_to_vm before acquiring the buffer
lock it's likely to happen that the page gets modified and all-visible
bit is cleared after setting true to all_visible_according_to_vm. This
assertion can easily be reproduced by adding a delay before the buffer
lock and invoking autovacuums frequently:

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index c3fc12d76c..76f067a7e4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1180,6 +1180,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
        buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno,
                                 RBM_NORMAL, vacrel->bstrategy);

+       pg_usleep(100000);
+
        /*
         * We need buffer cleanup lock so that we can prune HOT chains and
         * defragment the page.

So we should recheck also visibility map bit there but I think we can
remove this assertion since we already do that in later code and we
don’t treat this case as a should-not-happen case:

        /*
         * As of PostgreSQL 9.2, the visibility map bit should never be set if
         * the page-level bit is clear.  However, it's possible that the bit
         * got cleared after we checked it and before we took the buffer
         * content lock, so we must recheck before jumping to the conclusion
         * that something bad has happened.
         */
        else if (all_visible_according_to_vm && !PageIsAllVisible(page)
                 && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
        {
            elog(WARNING, "page is not marked all-visible but
visibility map bit is set in relation \"%s\" page %u",
                 vacrel->relname, blkno);
            visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }

I've attached a patch removing the assertion.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: decoupling table and index vacuum
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [bug?] Missed parallel safety checks, and wrong parallel safety