Re: BUG #6183: FATAL: canceling authentication due to timeout

From: Thorvald Natvig <thorvald(at)medallia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6183: FATAL: canceling authentication due to timeout
Date: 2011-08-30 01:22:51
Message-ID: 4E5C3B6B.70101@medallia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 8/29/11 5:50 PM, Tom Lane wrote:
> "Thorvald Natvig" <thorvald(at)medallia(dot)com> writes:
>> We get a lot of "FATAL: canceling authentication due to timeout" in the
>> log, with accompanying closed connections to clients.
> Well, the only known cause of that (other than genuine timeout
> conditions) is in fact fixed in 9.1rc1. You have not provided any
> information that would permit anyone to look for another cause.
This is a database server with fairly high traffic to multiple
databases. It seems to be related to multiple concurrent connections,
but I haven't had time to isolate a repeatable minimal testcase yet. I
was hoping that whatever was wrong was related to something obvious, or
that someone else had seen similar issues and were able to help with
isolating it.
Since this artifact is influencing the usability of the machine, I've
disabled the issuing of 'vacuumdb' for now (which "fixes" the issue).

>> There does indeed seem to be a correlation between doing vacuum and seeing
>> this error.
> Are you doing VACUUM FULLs on pg_authid (and if so, why)? If you are,
> is it possible that those are queuing up behind other queries that
> access pg_authid, and for some reason aren't releasing their locks
> promptly?
>
> regards, tom lane

Databases are created from plain-text backups with createdb and psql,
minimal modifications are done to a few rows, and then
vacuumdb -q -z ${db}

A bit later, this database is renamed, a copy of it is created with
'createdb -T olddb newdb', a lot of deletions (between 0 and 90% of the
rows) are performed and then
vacuumdb -q -f -z ${newdb}

The script doing this is run from several machines working on different
databases, all hosted on the same server. So it's possible there are
multiple full vacuums issued at the same time. However, there are no
users connected to the databases being vacuumed during this time, but
there are hundreds of connections to other databases on the same server;
these are the ones that fail. All of these databases have at one point
been created with -T on a database from the above process. As far as I
know, there are no direct queries to pg_ tables. All operations are
performed over tcp with the same user.

I don't know if this helps with where to look. If it doesn't, I'll try
to make a repeatable testcase on the weekend, when this server isn't
quite so essential.

Regards,
Thorvald

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Ding Yuan 2011-08-30 07:15:19 BUG #6184: Inconsistencies in log messages
Previous Message Tom Lane 2011-08-30 00:50:46 Re: BUG #6183: FATAL: canceling authentication due to timeout