Re: Unexpected VACUUM FULL failure

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unexpected VACUUM FULL failure
Date: 2007-08-10 17:17:33
Message-ID: 1186766253.5344.96.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2007-08-10 at 11:11 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> > Gregory Stark wrote:
> >> I don't think so since it sounds like he's saying to still sync the log and
> >> VACUUM FULL has an exclusive lock on the table. So any committed (or aborted)
> >> changes it sees in the table must have been committed or aborted before the
> >> log sync.
>
> > Hint bit updates are not WAL-logged, so there's no mechanism to keep the
> > data page from hitting the disk before the COMMIT record. That's the
> > reason why we can't just set the hint bits for async committed
> > transactions in the first place.
>
> Well the theory was that the flush at the start would ensure that VF's
> first scan could always set the hint bits. But I remembered this
> morning why I felt uneasy about that: it's not guaranteed true for
> system catalogs. We sometimes release locks on catalogs before
> committing. (Whether that is a good idea is a different discussion.)

Yes, I see comments in the VF code along those lines.

Seems like we should have code comments explaining exactly where we do
this, why and which things it causes issues for.

> What I'm now thinking we should do is to have the first scan check
> whether XMIN_COMMITTED is actually set (after HeapTupleSatisfiesVacuum
> claims it's good), and abandon defrag if not, the same as we already do
> in the other corner cases where we find not-guaranteed-committed tuples
> (see INSERT_IN_PROGRESS/DELETE_IN_PROGRESS cases in scan_heap).

I now think we *must* do this for system catalogs, or something else at
least. Personally I would prefer preventing async commits from occurring
when a system catalog has been touched at all, which would make the VF
situation and a few other situations go away entirely.

However, I see no reason to do as you suggest for other tables. It seems
like this would be an annoying feature to have VF sometimes fail,
depending upon the timing of other transactions. There would be a rare
failure if an async commit touched an early block in a table immediately
prior to a VF. It's rare but possible. This would be extremely annoying
if a DBA ran a job to VF a table overnight and then came back in to see
the ERROR message. It is fairly easy to code it this way though, if you
really think this is the best way.

> If we do that, we don't actually need XLogAsyncCommitFlush() at the start.
> I'm inclined to remove it just because it's ugly. Comments?

I'm not clear why XLogAsyncCommitFlush() is ugly. It's one line of code
that doesn't make anything else any harder. Apart from system catalogs,
doing it that way would be error free for all normal VFs.

> BTW, something else that just penetrated my brain is that this failure
> can only happen with synchronous_commit = off. In the sync-commit case,
> even though we have inexact bookkeeping for clog LSNs, it's always true
> that every LSN recorded for a clog page will have been flushed first.
> So we will always think we can set hint bits even though we might be
> testing an LSN later than the particular transaction in question.
> I just rechecked and verified that I'd been (without really thinking
> about it) running my test build with synchronous_commit = off for the
> past few days. If I happened to see this in one of the relatively small
> number of parallel regression sets I've run since then, then it should
> have been obvious in the buildfarm. The reason it wasn't was that none
> of the buildfarm machines are testing async-commit. We need to do
> something about that.

Yes, this issue effects async commit only.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-08-10 17:21:10 domain casting?
Previous Message Brendan Jurd 2007-08-10 16:53:11 Re: [HACKERS] Function structure in formatting.c