Обсуждение: BUG #6183: FATAL: canceling authentication due to timeout

Поиск
Список
Период
Сортировка

BUG #6183: FATAL: canceling authentication due to timeout

От
"Thorvald Natvig"
Дата:
The following bug has been logged online:

Bug reference:      6183
Logged by:          Thorvald Natvig
Email address:      thorvald@medallia.com
PostgreSQL version: 9.1rc1
Operating system:   RHEL6
Description:        FATAL:  canceling authentication due to timeout
Details:

We get a lot of "FATAL:  canceling authentication due to timeout" in the
log, with accompanying closed connections to clients.

We first saw this on 9.0.4. Googling around, I saw a reference on
postgresql-hackers to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=592b615d71
caac8a3504276a805a6fd024c40041

There does indeed seem to be a correlation between doing vacuum and seeing
this error.

Seeing as that commit was included in 9.1rc1, we tried upgrading to 9.1rc1
doing a full dump/restore. However, the exact same problem still remains.

Re: BUG #6183: FATAL: canceling authentication due to timeout

От
Tom Lane
Дата:
"Thorvald Natvig" <thorvald@medallia.com> writes:
> We get a lot of "FATAL:  canceling authentication due to timeout" in the
> log, with accompanying closed connections to clients.

Well, the only known cause of that (other than genuine timeout
conditions) is in fact fixed in 9.1rc1.  You have not provided any
information that would permit anyone to look for another cause.

> There does indeed seem to be a correlation between doing vacuum and seeing
> this error.

Are you doing VACUUM FULLs on pg_authid (and if so, why)?  If you are,
is it possible that those are queuing up behind other queries that
access pg_authid, and for some reason aren't releasing their locks
promptly?

            regards, tom lane

Re: BUG #6183: FATAL: canceling authentication due to timeout

От
Thorvald Natvig
Дата:
On 8/29/11 5:50 PM, Tom Lane wrote:
> "Thorvald Natvig" <thorvald@medallia.com> writes:
>> We get a lot of "FATAL:  canceling authentication due to timeout" in the
>> log, with accompanying closed connections to clients.
> Well, the only known cause of that (other than genuine timeout
> conditions) is in fact fixed in 9.1rc1.  You have not provided any
> information that would permit anyone to look for another cause.
This is a database server with fairly high traffic to multiple
databases. It seems to be related to multiple concurrent connections,
but I haven't had time to isolate a repeatable minimal testcase yet. I
was hoping that whatever was wrong was related to something obvious, or
that someone else had seen similar issues and were able to help with
isolating it.
Since this artifact is influencing the usability of the machine, I've
disabled the issuing of 'vacuumdb' for now (which "fixes" the issue).

>> There does indeed seem to be a correlation between doing vacuum and seeing
>> this error.
> Are you doing VACUUM FULLs on pg_authid (and if so, why)?  If you are,
> is it possible that those are queuing up behind other queries that
> access pg_authid, and for some reason aren't releasing their locks
> promptly?
>
>             regards, tom lane

Databases are created from plain-text backups with createdb and psql,
minimal modifications are done to a few rows, and then
vacuumdb -q -z ${db}

A bit later, this database is renamed, a copy of it is created with
'createdb -T olddb newdb', a lot of deletions (between 0 and 90% of the
rows) are performed and then
vacuumdb -q -f -z ${newdb}

The script doing this is run from several machines working on different
databases, all hosted on the same server. So it's possible there are
multiple full vacuums issued at the same time. However, there are no
users connected to the databases being vacuumed during this time, but
there are hundreds of connections to other databases on the same server;
these are the ones that fail. All of these databases have at one point
been created with -T on a database from the above process. As far as I
know, there are no direct queries to pg_ tables. All operations are
performed over tcp with the same user.

I don't know if this helps with where to look. If it doesn't, I'll try
to make a repeatable testcase on the weekend, when this server isn't
quite so essential.

Regards,
Thorvald