Re: [HACKERS] A design for amcheck heapam verification

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] A design for amcheck heapam verification
Date: 2018-01-22 23:22:05
Message-ID: CAH2-WzmFxtPmepZgPrkXkHK3hc0juFOdf2ZtYxt05iDtTuetFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jan 12, 2018 at 1:41 AM, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> I've looked into the code and here's my review.
>
> The new functionality works and is useful right now. I believe it should be shipped in the Postgres box (currently, I install it with packet managers).
> The code is nice and well documented.

Thanks.

> I'd be happy, if I could pass argument like ErrorLevel, which would help me to estimate scale of corruption. Or any other way to list more than one error. But, clearly, this functionality is not necessary for this patch.

My previous remarks apply here, too. I don't know how to rate severity
among error messages.

> Also, I couldn't find where is check that ensures that there is a heap tuple for every B-tree leaf tuple. Is it there?

No, there isn't, because that's inherently race-prone. This is not so
bad. If you don't end up discovering a problem the other way around
(going from the heap to the index/Bloom filter), then the only way
that an index tuple pointing to the wrong place can go undetected is
because there is a duplicate heap TID in the index, or the heap TID
doesn't exist in any form.

I actually prototyped a patch that makes bt_index_parent_check()
detect duplicate heap TIDs in an index, by collecting heap TIDs from
the index, sorting them, and scanning the sorted array. That seems
like material for another patch, though.

> I've found few neatpicks in bloom filter:
> 1. Choice of sdbmhash does not seems like a best option from my point of view.

I don't mind changing the second hash function, but I want to
emphasize that this shouldn't be expected to make anything more than a
very small difference. The bloom filter probes are slow because the
memory accesses are random.

There are many hash functions that would be suitable here, and not too
many reasons to prefer one over the other. My choice was based on a
desire for simplicity, and for something that seemed to be in
widespread usage for a long time.

> + bitset_bytes = Max(1024L * 1024L, bitset_bytes);
> bloom_work_mem was supposed to be the limit? Why we do not honor this limit?

The minimum maintenance_work_mem setting is 1MB anyway.

> 3.
> + filter = palloc0(offsetof(bloom_filter, bitset) +
> + sizeof(unsigned char) * bitset_bytes);
> sizeof(unsigned char) == 1 by C standard

I know. That's just what I prefer to do as a matter of style.

> 4. function my_bloom_power() returns bit numers, then it's result is powered INT64CONST(1) << bloom_power; back. I'd stik with removing bits in a loop by while(target_bitset_bits & (target_bitset_bits-1)) { target_bitset_bits&=target_bitset_bits-1; } or something like that. Or, if we use code like sdbm hash, fallback to bithacks :) https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

my_bloom_power() is only called once per Bloom filter. So again, I
think that this is not very important. Thomas Munro talked about using
popcount() in another function, which is from the book Hacker's
Delight. While I'm sure that these techniques have their use, they
just don't seem to make sense when the alternative is simpler code
that is only going to be executed at most once per verification
operation anyway.

> for (i = 0; i < filter->k_hash_funcs; i++)
> {
> /* Accumulate hash value for caller */
> hashes[i] = (hasha + i * hashb + i) % filter->bitset_bits;
> }
> }
> It produces almost same result (hashes 1..k-1 are +1'd), but uses a lot less % operations. Potential overflow is governed by uint64 type.

I prefer to be conservative, and to stick to what is described by
Dillinger & Manolios as much as possible.

> That's all what I've found. I do not know did the patch had all necessary reviewers attention. Please, feel free to change status if you think that patch is ready. From my point of view, the patch is in a good shape.

Michael said he'd do more review. I generally feel this is close, though.

Thanks for the review
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-22 23:29:58 Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]
Previous Message Stephen Frost 2018-01-22 23:21:00 Re: [HACKERS] make async slave to wait for lsn to be replayed