Обсуждение: Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Поиск
Список
Период
Сортировка

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
"Dorochevsky,Michel"
Дата:
> Question: do you have any leftover files in $PGDATA/pg_twophase ?
>
> I'm wondering why the log contains no warning messages about stale
> two-phase state files.  It looks to me like the system should have
> found the two-phase file still there upon restart, but the transaction
> should have been marked already committed.
>
> BTW, can you tell whether the failing transactions actually were committed
> --- are their effects still visible in the database?
>
>            regards, tom lane
Tom,
Thanks for your continuous support, I appreciate a lot.

The failing transaction is visible in the database after restart, I have
checked three of the last inserts, e.g.
2007-04-21 18:06:18.921  20160 LOG:  execute <unnamed>: insert into
CHECKRESULT (COMMENT, POSITIONINCHAIN, MDSD_OPT_LOCK, MDSD_CLASS, ID) values
($1, $2, $3, 'CheckResult', $4)
2007-04-21 18:06:18.921  20160 DETAIL:  parameters: $1 = 'geht schon', $2 =
'2', $3 = '2007-04-21 18:06:18.64', $4 = '4046'
is visible. I can tell from the application, that this record will never be
updated later on and always has the current timestamp.

I have no leftover file in $PGDATA/pg_twophase, it is empty.

Best Regards
-- Michel

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
Tom Lane
Дата:
"Dorochevsky,Michel" <michel.dorochevsky@softcon.de> writes:
> The failing transaction is visible in the database after restart, I have
> checked three of the last inserts, e.g.

Good, at least we're not losing data ;-).  But I expected that because
this PANIC must be occurring after the RecordTransactionCommitPrepared
step.

> I have no leftover file in $PGDATA/pg_twophase, it is empty.

[ digs in code some more... ]  Oh, I see how that happens: the 2PC
state file is removed when the XLOG_XACT_COMMIT_PREPARED xlog entry
is replayed, so the various code paths that might emit a warning
won't be reached.

Heikki, have you been paying attention to this thread?  You have any
idea what's happening?  The whole thing seems pretty unexplainable
to me, especially since Michel's log shows this happening without any
concurrent activity that might confuse matters.  I confess bafflement.

            regards, tom lane

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> "Dorochevsky,Michel" <michel.dorochevsky@softcon.de> writes:
>> The failing transaction is visible in the database after restart, I have
>> checked three of the last inserts, e.g.
>
> Good, at least we're not losing data ;-).  But I expected that because
> this PANIC must be occurring after the RecordTransactionCommitPrepared
> step.
>
>> I have no leftover file in $PGDATA/pg_twophase, it is empty.
>
> [ digs in code some more... ]  Oh, I see how that happens: the 2PC
> state file is removed when the XLOG_XACT_COMMIT_PREPARED xlog entry
> is replayed, so the various code paths that might emit a warning
> won't be reached.
>
> Heikki, have you been paying attention to this thread?  You have any
> idea what's happening?  The whole thing seems pretty unexplainable
> to me, especially since Michel's log shows this happening without any
> concurrent activity that might confuse matters.  I confess bafflement.

Oh, no I wasn't. I'm up to speed now.

I can't see any way that can happen either. There's some other
transactions running, but not at the time of prepare or commit. And
there's no other errors or unusual activity in the logs.

The only thing I can think of is that a lock is released between the
calls to AtPrepare_Locks and PostPrepare_Locks. But I don't see how that
could happen.

I think we need to see more debug-information. Is there a debug- and
assertion-enabled binary available for Windows?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
Dave Page
Дата:
Heikki Linnakangas wrote:
> I think we need to see more debug-information. Is there a debug- and
> assertion-enabled binary available for Windows?

Unfortunately no - 95% of the time we've found that Mingw/gdb on windows
simply doesn't work. That's one of the major reasons why we're working
on moving to VC++.

I can build one tomorrow if you want to try for the 5%. What version was
this?

Regards, Dave.

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
Tom Lane
Дата:
Dave Page <dpage@postgresql.org> writes:
> Heikki Linnakangas wrote:
>> I think we need to see more debug-information. Is there a debug- and
>> assertion-enabled binary available for Windows?

> Unfortunately no - 95% of the time we've found that Mingw/gdb on windows
> simply doesn't work. That's one of the major reasons why we're working
> on moving to VC++.

> I can build one tomorrow if you want to try for the 5%. What version was
> this?

Having assertions turned on would be useful regardless of debug support.
What I was going to suggest was to add some detail printout to the PANIC
message --- in particular, dump the fields of the problem LOCKTAG, so
we can at least find out *what* lock is being lost.  If you build a
custom copy for Michel, please add this patch (untested but should work):

*** src/backend/storage/lmgr/lock.c.orig    Thu Feb  1 15:09:33 2007
--- src/backend/storage/lmgr/lock.c    Sun Apr 22 16:17:01 2007
***************
*** 2430,2437 ****
                                                  HASH_FIND,
                                                  NULL);
      if (!lock)
!         elog(PANIC, "failed to re-find shared lock object");
!
      /*
       * Re-find the proclock object (ditto).
       */
--- 2430,2443 ----
                                                  HASH_FIND,
                                                  NULL);
      if (!lock)
!         elog(PANIC, "failed to re-find shared lock object: %u %u %u %u %u %u",
!              locktag->locktag_field1,
!              locktag->locktag_field2,
!              locktag->locktag_field3,
!              locktag->locktag_field4,
!              locktag->locktag_type,
!              locktag->locktag_lockmethodid);
! }
      /*
       * Re-find the proclock object (ditto).
       */


            regards, tom lane

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
Heikki Linnakangas
Дата:
Dave Page wrote:
> Heikki Linnakangas wrote:
>> I think we need to see more debug-information. Is there a debug- and
>> assertion-enabled binary available for Windows?
>
> Unfortunately no - 95% of the time we've found that Mingw/gdb on windows
> simply doesn't work. That's one of the major reasons why we're working
> on moving to VC++.

I'm not so much interested in using gdb, but in having assertions
enabled and getting the output of LOCK_DEBUG.

> I can build one tomorrow if you want to try for the 5%. What version was
> this?

Thanks, it was 8.2.3.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

От
Dave Page
Дата:
Heikki Linnakangas wrote:
>> I can build one tomorrow if you want to try for the 5%. What version was
>> this?
>
> Thanks, it was 8.2.3.

Actually, no reason this needs to wait until I'm in the office.

Michel; I've uploaded an 8.2.3 postgres.exe to
http://developer.pgadmin.org/~dpage/postgres-8.2.3-debug.zip. This is
the same as the release version, but configured with --enable-debug
--enable-cassert, and patched with Tom's patch.

Regards, Dave.