infinite loop in _bt_getstackbuf

Поиск
Список
Период
Сортировка
От Robert Haas
Тема infinite loop in _bt_getstackbuf
Дата
Msg-id CA+TgmoZzd1MB3qqMeJiUXM569JySqYd_uJ9KiBByy6w0iMUrXg@mail.gmail.com
обсуждение исходный текст
Ответы Re: infinite loop in _bt_getstackbuf  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: infinite loop in _bt_getstackbuf  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
A colleague at EnterpriseDB today ran into a situation on PostgreSQL
9.3.5 where the server went into an infinite loop while attempting a
VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't
be killed with ^C.   I think we should add a check for interrupts into
that loop somewhere; and possibly make some attempt to notice if we've
been iterating for longer than, say, the lifetime of the universe
until now.

The fundamental structure of that function is an infinite loop.  We
break out of that loop when BTEntrySame(item, &stack->bts_btentry) or
P_RIGHTMOST(opaque) and I'm sure that it's correct to think that, in
theory, one of those things will eventually happen.  But the index
could be corrupted, most obviously by having a page where
opaque->btpo_next points pack to the current block number.  If that
happens, you need an immediate shutdown (or some clever gdb hackery)
to terminate the VACUUM.  That's unfortunate and unnecessary.

It also looks likes something we can fix, at a minimum by adding a
CHECK_FOR_INTERRUPTS() at the top of that loop, or in some function
that it calls, like _bt_getbuf(), so that if it goes into an infinite
loop, it can at least be killed.  We could also onsider adding a check
at the bottom of the loop, just before setting blkno =
opaque->btpo_next, that those values are unequal.  If they are,
elog().  Clearly it's possible to have a cycle of length >1, and such
a check wouldn't catch that, but it might still be worth checking for
the trivial case.  Or, we could try to put an upper bound on the
number of iterations that are reasonable and error out if we exceed
that value.  That might be tricky, though; it's not obvious to me that
there's any comfortably small upper bound.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: TAP test breakage on MacOS X
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WAL format and API changes (9.5)