Re: 7.0.2 dies when connection dropped mid-transaction

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: 7.0.2 dies when connection dropped mid-transaction
Дата
Msg-id 2925.973823430@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: 7.0.2 dies when connection dropped mid-transaction  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: 7.0.2 dies when connection dropped mid-transaction  (Alfred Perlstein <bright@wintelcom.net>)
Re: 7.0.2 dies when connection dropped mid-transaction  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
I said:
> OK, after digging some more, it seems that the critical requirement
> is that the cursor's query contain a hash join.

Here's the deal:

test7=# set enable_mergejoin to off;
SET VARIABLE
test7=# begin;
BEGIN
-- I've previously checked that this produces a hash join plan:
test7=# declare c cursor for select * from foo t1, foo t2 where t1.f1=t2.f1;
SELECT
test7=# fetch 1 from c;f1 | f1
----+---- 1 |  1
(1 row)

test7=# abort;
NOTICE:  trying to delete portal name that does not exist.
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 

This happens with either 7.0.2 or 7.0.3 (probably with anything back to
6.5, if not before).  It does *not* happen with current development tip.

The problem is that two "portal" structures are used.  One holds the
overall query plan and execution state for the cursor, and the other
holds the hash table for the hash join.  During abort, the portal
manager tries to delete both of them.  BUT: deleting the query plan
causes query cleanup to be executed, which among other things deletes
the hash join's table.  Then the portal manager tries to delete the
already-deleted second portal, which leads first to the above notice
and then to Assert failure (and probably would lead to coredump if
you didn't have Asserts on).  Alternatively, it might try to delete
the hash join portal first, which would leave the query cleanup code
deleting an already-deleted portal, and doubtless still crashing.

Current sources don't show the problem because hashtables aren't kept
in portals anymore.

I've thought for some time that CollectNamedPortals is a horrid kluge,
and really ought to be rewritten.  Hadn't seen it actually do the wrong
thing before, but now...

I guess the immediate question is do we want to hold up 7.0.3 release
for a fix?  This bug is clearly ancient, so I'm not sure it's
appropriate to go through a fire drill to fix it for 7.0.3.
Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Hollomon
Дата:
Сообщение: Re: Unhappy thoughts about pg_dump and objects inherited from template1
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Unhappy thoughts about pg_dump and objects inherited from template1