Re: (auto)vacuum truncate exclusive lock

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: (auto)vacuum truncate exclusive lock
Date: 2013-04-12 06:14:10
Message-ID: CAMkU=1xE4mGfn3VVg8W5V+ng0EbzmH2KxRg5XFhxBf7ibTyo8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, April 11, 2013, Kevin Grittner wrote:

>
> > I also log the number of pages truncated at the time it gave up,
> > as it would be nice to know if it is completely starving or
> > making some progress.
>
> If we're going to have the message, we should make it useful. My
> biggest question here is not whether we should add this info, but
> whether it should be DEBUG instead of LOG
>

I like it being LOG. If it were DEBUG, I don't think anyone would be
likely to see it when they needed to, as it happens sporadically on busy
servers and I don't think people would run those with DEBUG on. I figure
it is analogous to an autovacuum cancel message it partially replaces, and
those are LOG.

>
> > Also, I think that permanently boycotting doing autoanalyze
> > because someone is camping out on an access share lock (or
> > because there are a never-ending stream of overlapping locks) and
> > so the truncation cannot be done is a bit drastic, especially for
> > inclusion in a point release.
>
> That much is not a change in the event that the truncation does not
> complete.

OK, I see that now. In the old behavior, of the lock was acquired, but
then we were shoved off from it, the analyze was not done. But, in the old
behavior if the lock was never acquired at all, then it would go ahead to
do the autoanalyze, and that has changed. That is they way I was testing
it (camping out on an access shared lock so the access exclusive could
never be granted in the first place; because intercepting it during the
truncate phase was hard to do) and I just assumed the behavior I saw would
apply to both cases, but it does not.

> I have seen cases where the old logic head-banged for
> hours or days without succeeding at the truncation attempt in
> autovacuum, absolutely killing performance until the user ran an
> explicit VACUUM. And in the meantime, since the deadlock detection
> logic was killing autovacuum before it got to the analyze phase,
> the autoanalyze was never done.
>

OK, so there three problems. It would take a second to yield, in doing so
it would abandon all the progress it had made in that second rather than
saving it, and it would tight loop (restricted by naptime) on this because
of the lack of analyze. So it fixed the first two in a way that seems an
absolute improvement for the auto case, but it made the third one worse in
a common case, where it never acquires the lock in the first place, and so
doesn't analyze when before it did in that one case.

>
> Perhaps the new logic should go ahead and get its lock even on a
> busy system (like the old logic),

As far as I can tell, the old logic was always conditional on the
AccessExlusive lock acquisition, whether it was manual or auto.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2013-04-12 08:53:31 Re: Inconsistent DB data in Streaming Replication
Previous Message Pavan Deolasee 2013-04-12 05:48:01 Re: Inconsistent DB data in Streaming Replication