On Wed, May 2, 2012 at 4:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Brainstorming wildly, how about something like this:
>> 1. Insert a new copy of the tuple onto some other heap page. The new
>> tuple's xmin will be that of the process doing the tuple move, and
>> we'll also set a flag indicating that a move is in progress.
>> 2. Set a flag on the old tuple, indicating that a tuple move is in
>> progress. Set its TID to the new location of the tuple. Set xmax to
>> the tuple mover's XID. Optionally, truncate away the old tuple data,
>> leaving just the tuple header.
>> 3. Scan all indexes and replace any references to the old tuple's TID
>> with references to the new tuple's TID.
>> 4. Commit.
> What happens when you crash partway through that?
Well, there are probably a few loose ends here, but the idea is that
if we crash after step 2 is complete, the next vacuum is responsible
for performing steps 3 and 4. As written, there's probably a problem
if we crash between (1) and (2); I think those would need to be done
atomically, or at least we need to make sure that the moving-in flag
is set on the new tuple if and only if there is actually a redirect
pointing to it.
> Also, what happens if
> somebody wishes to update the tuple before the last step is complete?
Then we let them. The idea is that they see the redirect tuple at the
old TID, follow it to the new copy of the tuple, and update that
> In any case, this doesn't address the fundamental problem with unlocked
> tuple movement, which is that you can't just arbitrarily change a
> tuple's TID when there might be other operations relying on the TID
> to hold still.
Well, that's why I invented the redirect tuple, so that anyone who was
relying on the TID to hold still would see the redirect and say, oh, I
need to go look at this other TID instead. It's entirely possible
there's some reason why that can't work, but at the moment I'm not
seeing it. I see that there's a problem if the old TID gets freed
while someone's relying on it, but replacing it with a pointer to some
other TID seems like it ought to be workable.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2012-05-02 22:36:19|
|Subject: Re: proposal: additional error fields |
|Previous:||From: Tom Lane||Date: 2012-05-02 20:39:04|
|Subject: Re: online debloatification (was: extending relations more efficiently) |