Re: canceling autovacuum task woes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: canceling autovacuum task woes
Date: 2012-07-24 21:48:09
Message-ID: 05DA660B-FF87-4382-B1AF-A5D8DC0D88F7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 24, 2012, at 4:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jul 24, 2012 at 4:03 PM, Alvaro Herrera
>> <alvherre(at)commandprompt(dot)com> wrote:
>>> Looks great. Are you considering backpatching this?
>
>> Well, that would certainly make MY life easier. I am not sure whether
>> it would be in line with project policy, however.
>
> +1 for a backpatch. Otherwise it'll be years before we gain any
> information about the unexpected cancels that you think exist

OK, great.

> However, after looking some more at deadlock.c, I wonder whether
> (a) this patch gives sufficient detail, and (b) whether there isn't a
> problem that's obvious by inspection. It appears to me that as the
> blocking_autovacuum_proc stuff is coded, it will finger an AV proc as
> needing to be killed even though it may be several graph edges out from
> the current proc. This means that with respect to (a), the connection
> from the process doing the kill to the AV proc may be inadequately
> documented by this patch, and with respect to (b), there might well be
> cases where we found an AV proc somewhere in the graph traversal but
> it's not actually guilty of blocking the current process ... especially
> not after the queue reorderings that we may have done. I think I'd be
> happier with that code if it restricted its AV targets to procs that
> *directly* block the current process, which not incidentally would make
> this amount of log detail sufficient.

Uggh. Well, that certainly sounds like something that could cause spurious cancels - or excessively fast ones, since presumably if we limit it to things that directly block the current process, you'll always allow the full deadlock_timeout before nuking the autovac worker. So +1 for changing that.

Does an edge in this context mean any lock, or just an ungranted one? I assume the latter, which still leaves the question of where the edges are coming from in the first place.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2012-07-24 21:59:20 Re: [patch] libpq one-row-at-a-time API
Previous Message Merlin Moncure 2012-07-24 21:35:57 Re: [patch] libpq one-row-at-a-time API