Re: Hot Standby and VACUUM FULL

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby and VACUUM FULL
Date: 2010-02-01 15:45:12
Message-ID: 20417.1265039112@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> The other problem with what you sketch is that it'd require holding the
>> mapfile write lock across commit, because we still have to have strict
>> serialization of updates.

> Why is the strict serialization of updates needed? To avoid overwriting
> the file with stale contents in a race condition?

Exactly.

> I was thinking that we only store the modified part in the WAL record.
> Right after writing commit record, take the lock, read() the map file,
> modify it in memory, write() it back, and release lock.

> That means that there's no full images of the file in WAL records, which
> makes me slightly uncomfortable from a disaster recovery point-of-view,

Yeah, me too, which is another advantage of using a separate WAL entry.

> That doesn't solve the problem I was trying to solve, which is that if
> the map file is rewritten, but the transaction later aborts, the map
> file update has hit the disk already. That's why I wanted to stash it
> into the commit record.

The design I sketched doesn't require such an assumption anyway. Once
the map file is written, the relocation is effective, commit or no.
As long as we restrict relocations to maintenance operations such as
VACUUM FULL, which have no transactionally significant results, this
doesn't seem like a problem. What we do need is that after a CLUSTER
or V.F., which is going to relocate not only the rel but its indexes,
the relocations of the rel and its indexes have to all "commit"
atomically. But saving up the transaction's map changes and applying
them in one write takes care of that.

(What I believe this means is that pg_class and its indexes have to all
be mapped, but I'm thinking right now that no other non-shared catalogs
need the treatment.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-02-01 15:46:10 Re: Package namespace and Safe init cleanup for plperl UPDATE 3 [PATCH]
Previous Message Robert Haas 2010-02-01 15:43:36 pgsql: Augment EXPLAIN output with more details on Hash nodes.