Re: We need to log aborted autovacuums

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: We need to log aborted autovacuums
Date: 2011-01-16 16:23:16
Message-ID: 4D331B74.6000004@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Sat, Jan 15, 2011 at 11:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Greg Smith <greg(at)2ndquadrant(dot)com> writes:
>>
>>> Does try_relation_open need to have a lock acquisition timeout when AV
>>> is calling it?
>>>
>> Hmm. I think when looking at the AV code, I've always subconsciously
>> assumed that try_relation_open would fail immediately if it couldn't get
>> the lock. That certainly seems like it would be a more appropriate way
>> to behave than delaying indefinitely.
>>
>
> I'm confused how that's not happening already. What does "try" mean, otherwise?
>

Apparently "try" means acquire the requested lock on the oid of the
relation, waiting for any amount of time for that part, and then check
if the relation really exists or not once it's got it. In this context,
it means it will try to open the relation, but might fail if it doesn't
actually exist anymore. The relation not existing once it tries the
check done after the lock is acquired is the only way it will return a
failure.

try_relation_open calls LockRelationOid, which blocks. There is also a
ConditionalLockRelationOid, which does the same basic thing except it
exits immediately, with a false return code, if it can't acquire the
lock. I think we just need to nail down in what existing cases it's
acceptable to have try_relation_oid use ConditionalLockRelationOid
instead, which would make it do what all us reading the code thought it
did all along.

There are four callers of try_relation_open to be concerned about here:

src/backend/commands/vacuum.c vacuum_rel
onerel = try_relation_open(relid, lmode);

src/backend/commands/analyze.c analyze_rel
onerel = try_relation_open(relid, ShareUpdateExclusiveLock);

src/backend/commands/cluster.c cluster_rel
OldHeap = try_relation_open(tableOid, AccessExclusiveLock);

src/backend/commands/lockcmds.c LockTableRecurse
* Apply LOCK TABLE recursively over an inheritance tree
rel = try_relation_open(reloid, NoLock);

I think that both the vacuum_rel and analyze_rel cases are capable of
figuring out if they are the autovacuum process, and if so calling the
fast non-blocking version of this. I wouldn't want to mess with the
other two, which rely upon the current behavior as far as I can see.

Probably took me longer to write this e-mail than the patch will take.
Since I've got trivial patch fever this weekend and already have the
test case, I'll do this one next.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-01-16 16:25:37 Re: replication and pg_hba.conf
Previous Message Magnus Hagander 2011-01-16 14:52:29 Re: Recovery control functions