Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-07-30 20:18:01
Message-ID: DCD52DE8-C9EF-4C10-BFCC-5293E4742C30@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jul 29, 2020, at 12:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jul 20, 2020 at 5:02 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> I've made the options 'all-visible', 'all-frozen', and 'none'. It defaults to 'none'.
>
> That looks nice.
>
>>> I guess that
>>> could still be expensive if there's a lot of them, but needing
>>> ShareUpdateExclusiveLock rather than only AccessShareLock is a little
>>> unfortunate.
>>
>> I welcome strategies that would allow for taking a lesser lock.
>
> I guess I'm not seeing why you need any particular strategy here. Say
> that at the beginning you note the starting relfrozenxid of the table
> -- I think I would lean toward just ignoring datfrozenxid and the
> cluster-wide value completely. You also note the current value of the
> transaction ID counter. Those are the two ends of the acceptable
> range.
>
> Let's first consider the oldest acceptable XID, bounded by
> relfrozenxid. If you see a value that is older than the relfrozenxid
> value that you noted at the start, it is definitely invalid. If you
> see a newer value, it could still be older than the table's current
> relfrozenxid, but that doesn't seem very worrisome. If the user
> vacuumed the table while they were running this tool, they can always
> run the tool again afterward if they wish. Forcing the vacuum to wait
> by taking ShareUpdateExclusiveLock doesn't actually solve anything
> anyway: you STILL won't notice any problems the vacuum introduces, and
> in fact you are now GUARANTEED not to notice them, plus now the vacuum
> happens later.
>
> Now let's consider the newest acceptable XID, bounded by the value of
> the transaction ID counter. Any time you see a newer XID than the last
> value of the transaction ID counter that you observed, you go observe
> it again. If the value from the table still looks invalid, then you
> complain about it. Either way, you remember the new observation and
> check future tuples against that value. I think the patch is already
> doing this anyway; if it weren't, you'd need an even stronger lock,
> one sufficient to prevent any insert/update/delete activity on the
> table altogether.
>
> Maybe I'm just being dense here -- exactly what problem are you worried about?

Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit. I am worried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check. The three strategies I had for dealing with that were taking the XactTruncationLock (formerly known as CLogTruncationLock, for those reading this thread from the beginning), locking out vacuum, and the idea upthread from Andres about setting PROC_IN_VACUUM and such. Maybe I'm being dense and don't need to worry about this. But I haven't convinced myself of that, yet.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-07-30 20:47:06 Re: new heapcheck contrib module
Previous Message Robert Haas 2020-07-30 19:51:10 Re: Threading in BGWorkers (!)