On 7/4/23 13:53, hubert depesz lubaczewski wrote:
> On Tue, Jul 04, 2023 at 01:30:21PM +0200, Tomas Vondra wrote:
>> On 7/3/23 14:58, hubert depesz lubaczewski wrote:
>> So is it an infinite loop in ReorderBufferExecuteInvalidations, or is it
>> just the case that there are many invalidations? I can't really deduce
>> that from the backtraces.
>>
>> How many invalidations does the transaction have? Should be enough to
>>
>> print txn->ninvalidations
>>
>> Also, is there anything interesting about the transaction? You know the
>> XID (2741814901) so maybe use pg_waldump to see what it did.
>
> I don't have the process anymore. Pg was restarted, and the app was
> forced to not take data from this place.
>
OK. So the slot is still there, and we could try reading the data again
(hopefully getting stuck in the same place).
> I can run pg_waldump, no problem, but I'm not an expert on this. Which
> wal file(s) should I try to dump? and what to look for (or should I just
> provide full output of dump?
>
The backtrace has this:
and 187650155969544 should be LSN AAAA/B4E37C08, which maps to
select pg_walfile_name('AAAA/B4E37C08');
pg_walfile_name
--------------------------
000000010000AAAA000000B4
So maybe try dumping that WAL segment and see if the XID 2741814901 is
there. If yes, maybe share the whole dump, it's difficult to say what to
share without knowing what the transaction did.
There's also the LSNs in pg_replication_slots - it might be interesting
to see dump for all the WAL between restart_lsn and confirmed_flush_lsn.
So like this segments between:
SELECT pg_walfile_name(restart_lsn),
pg_walfile_name(confirmed_flush_lsn)
FROM pg_replication_slots WHERE slot_name = 'slot_name';
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company