Justin Pasher wrote:
> Hello,
>
> I have a server running PostgreSQL 8.1.15-0etch1 (Debian etch) that was
> recently put into production. Last week a developer started having a problem
> with his psql connection being terminated every couple of minutes when he
> was running a query. When I look through the logs, I noticed this message.
>
> 2009-01-09 08:09:46 CST LOG: autovacuum process (PID 15012) was terminated
> by signal 11
Segmentation fault - probably a bug or bad RAM.
> I looked through the logs some more and I noticed that this was occurring
> every minute or so. The database is a pretty heavily utilized system
> (judging by the age(datfrozenxid) from pg_database, the system had run
> approximately 500 million queries in less than a week). I noticed that right
> before every autovacuum termination, it tried to autovacuum a database.
>
> 2009-01-09 08:09:46 CST LOG: transaction ID wrap limit is 4563352, limited
> by database "database_name"
>
> It was always showing the same database, so I decided to manually vacuum the
> database. Once that was done (it was successful the first time without
> errors), the problem seemed to go away. I went ahead and manually vacuumed
> the remaining databases just to take care of the potential xid wraparound
> issue.
I'd be suspicious of possible corruption in autovacuum's internal data.
Can you trace these problems back to a power-outage or system crash? It
doesn't look like "database_name" itself since you vacuumed that
successfully. If autovacuum is running normally now, that might indicate
it was something in the way autovacuum was keeping track of "database_name".
It's also probably worth running some memory tests on the server -
(memtest86 or similar) to see if that shows anything. Was it *always*
the autovacuum process getting sig11? If not then it might just be a
pattern of usage that makes it more likely to use some bad RAM.
--
Richard Huxton
Archonet Ltd