Re: Proposal: Log inability to lock pages during vacuum

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Log inability to lock pages during vacuum
Date: 2014-12-08 21:28:54
Message-ID: 54861816.3020502@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/7/14, 6:16 PM, Simon Riggs wrote:
> On 20 October 2014 at 10:57, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>
>> Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
>> lock on, with no retry. Presumably this should be a rare occurrence, but I
>> think it's bad that we just assume that and won't warn the user if something
>> bad is going on.
>
> (I'm having email problems, so I can't see later mails on this thread,
> so replying here.)
>
> Logging patch looks fine, but I would rather not add a line of text
> for each VACUUM, just in case this is non-zero. I think we should add
> that log line only if the blocks skipped > 0.

I thought about doing that, but I'm loath to duplicate a rather large ereport call. Happy to make the change if that's the consensus though.

> What I'm more interested in is what you plan to do with the
> information once we get it?
>
> The assumption that skipping blocks is something bad is strange. I
> added it because VACUUM could and did regularly hang on busy tables,
> which resulted in bloat because other blocks that needed cleaning
> didn't get any attention.
>
> Which is better, spend time obsessively trying to vacuum particular
> blocks, or to spend the time on other blocks that are in need of
> cleaning and are available to be cleaned?
>
> Which is better, have autovacuum or system wide vacuum progress on to
> other tables that need cleaning, or spend lots of effort retrying?
>
> How do we know what is the best next action?
>
> I'd really want to see some analysis of those things before we spend
> even more cycles on this.

That's the entire point of logging this information. There is an underlying assumption that we won't actually skip many pages, but there's no data to back that up, nor is there currently any way to get that data.

My hope is that the logging shows that there isn't anything more that needs to be done here. If this is something that causes problems, at least now DBAs will be aware of it and hopefully we'll be able to identify specific problem scenarios and find a solution.

BTW, my initial proposal[1] was strictly logging. The only difference was raising it to a warning if a significant portion of the table was skipped. I only investigated retrying locks at the suggestion of others. I never intended this to become a big time sink.

[1]:
"Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on.

"My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip more than 1% of the pages we visited (not relpages) then elog(WARNING) instead."
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2014-12-08 21:40:43 Re: intel s3500 -- hot stuff
Previous Message Jim Nasby 2014-12-08 21:08:43 Re: [v9.5] Custom Plan API