RE: dsa_allocate() faliure

Поиск
Список
Период
Сортировка
От Arne Roland
Тема RE: dsa_allocate() faliure
Дата
Msg-id d9c6cc80e21241349db53b2f64075029@index.de
обсуждение исходный текст
Ответ на Re: dsa_allocate() faliure  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: dsa_allocate() faliure  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-performance
It's definitely a quite a relatively complex pattern. The query I set you last time was minimal with respect to
predicates(so removing any single one of the predicates converted that one into a working query).
 
> Huh.  Ok well that's a lot more frequent that I thought.  Is it always the same query?  Any chance you can get the
plan? Are there more things going on on the server, like perhaps concurrent parallel queries?
 
I had this bug occurring while I was the only one working on the server. I checked there was just one transaction with
asnapshot at all and it was a autovacuum busy with a totally unrelated relation my colleague was working on.
 

The bug is indeed behaving like a ghost.
One child relation needed a few new rows to test a particular application a colleague of mine was working on. The
inserttriggered an autoanalyze and the explain changed slightly:
 
Besides row and cost estimates the change is that the line
Recheck Cond: (((COALESCE((fid)::bigint, fallback) ) >= 1) AND ((COALESCE((fid)::bigint, fallback) ) <= 1) AND (gid &&
'{853078,853080,853082}'::integer[]))
is now 
Recheck Cond: ((gid && '{853078,853080,853082}'::integer[]) AND ((COALESCE((fid)::bigint, fallback) ) >= 1) AND
((COALESCE((fid)::bigint,fallback) ) <= 1))
 
and the error vanished.

I could try to hunt down another query by assembling seemingly random queries. I don't see a very clear pattern from
thequeries aborting with this error on our production servers. I'm not surprised that bug is had to chase on production
servers.They usually are quite lively.
 

>If you're able to run a throwaway copy of your production database on another system that you don't have to worry
aboutcrashing, you could just replace ERROR with PANIC and run a high-speed loop of the query that crashed in product,
orsomething.  This might at least tell us whether it's reach that condition via something dereferencing a dsa_pointer
orsomething manipulating the segment lists while allocating/freeing.
 

I could take a backup and restore the relevant tables on a throwaway system. You are just suggesting to replace line
728
elog(FATAL,
                                 "dsa_allocate could not find %zu free pages", npages);
by
elog(PANIC,
                                 "dsa_allocate could not find %zu free pages", npages);
correct? Just for my understanding: why would the shutdown of the whole instance create more helpful logging?

All the best
Arne

В списке pgsql-performance по дате отправления:

Предыдущее
От: Mariel Cherkassky
Дата:
Сообщение: Re: ERROR: found xmin from before relfrozenxid
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: dsa_allocate() faliure