Re: preserving forensic information when we freeze

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: preserving forensic information when we freeze
Date: 2013-05-29 00:00:42
Message-ID: 20130529000042.GC818@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
> I have attempted to implement this. Trouble is, we're out of infomask
> bits. Using an infomask2 bit might work but we don't have many of
> them left either, so it's worth casting about a bit for a better
> solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
> this purpose, but I think we're better off trying to reclaim those
> bits in a future release. Exactly how to do that is a topic for
> another email, but I believe it's very possible. What I'd like to
> propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
> indicate that xmin is frozen. This bit pattern isn't used for
> anything else, so there's no confusion possible with existing data
> already on disk, but it's necessary to audit all code that checks
> HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done
> this, and there's little enough of it that it seems pretty easy to
> handle.

I only suggested MOVED_IN/OFF because I didn't remember
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
combination could have been produced in the past in a way I couldn't
find so far, I am all for it. I don't see a problem with breaking
backward compatibility on that level and I don't think there's any way
to get there without some level of low level compat break.

> - When we follow HOT chains, we determine where the HOT chain ends by
> matching the xmax of each tuple with the xmin of the next tuple. If
> they don't match, we conclude that the HOT chain has ended. I
> initially thought this logic might be buggy even as things stand today
> if the latest tuple in the chain is frozen, but as Andres pointed out
> to me, that's not so. If the newest tuple in the chain is all-visible
> (the earliest point at which we can theoretically freeze it), all
> earlier tuples are dead altogether, and heap_page_prune() is always
> called after locking the buffer and before freezing, so any tuple we
> freeze must be the first in its HOT chain. For the same reason, this
> logic doesn't need any adjustment for the new freezing system: it's
> never looking at anything old enough to be frozen in the first place.

I am all for adding a comment why this is safe though. We thought about
it for some while before we were convinced...

> - Various procedural languages use the combination of TID and XMIN to
> determine whether a function needs to be recompiled. Although the
> possibility of this doing the wrong thing seems quite remote, it's not
> obvious to me why it's theoretically correct even as things stand
> today. Suppose that previously-frozen tuple is vacuumed away and
> another tuple is placed at the same TID and then frozen. Then, we
> check whether the cache is still valid and, behold, it is. This would
> actually get better with this patch, since it wouldn't be enough
> merely for the old and new tuples to both be frozen; they'd have to
> have had the same XID prior to freezing. I think that could only
> happen if a backend sticks around for at least 2^32 transactions, but
> I don't know what would prevent it in that case.

Hm. As previously said, I am less than convinced of those adhoc
mechanisms and I think this should get properly integrated into the
normal cache invalidation mechanisms.
But: I think this is safe since we compare the stored/cached xmin/tid
with one gotten from the SearchSysCache just before which will point to
the correct location as of the last AcceptInvalidationMessages(). I
can't think of a way were this would allow the case you describe.

> - heap_get_latest_tid() appears broken even without this patch. It's
> only used on user-specified TIDs, either in a TID scan, or due to the
> use of currtid_byreloid() and currtid_byrelname(). It attempts find
> the latest version of the tuple referenced by the given TID by
> following the CTID links. Like HOT, it uses XMAX/XMIN matching to
> detect when the chain is broken. However, unlike HOT, update chains
> can in general span multiple blocks. It is not now nor has it ever
> been safe to assume that the next tuple in the chain can't be frozen
> before the previous one is vacuumed away. Andres came up with the
> best example: suppose the tuple to be frozen physically precedes its
> predecessor; then, an in-progress vacuum might reach the to-be-frozen
> tuple before it reaches (and removes) the previous row version. In
> newer releases, the same problem could be caused by vacuum's
> occasional page-skipping behavior. As with the previous point, the
> "don't actually change xmin when we freeze" approach actually makes it
> harder for a chain to get "broken" when it shouldn't, but I suspect
> it's just moving us from one set of extremely-obscure failure cases to
> another.

I don't think this is especially problematic though. If you do a tidscan
starting from a tid that is so old that it can be removed: you're doing
it wrong. The tid could have been reused for something else anyway. I
think the ctid chaining is only meaningful if the tuple got updated very
recently, i.e. you hold a snapshot that prevents the removal of the
root tuple's snapshot.

Nice work!

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-05-29 00:12:23 Re: Logging of PAM Authentication Failure
Previous Message Robert Haas 2013-05-28 23:51:57 Re: getting rid of freezing