Re: [HACKERS] A design for amcheck heapam verification

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] A design for amcheck heapam verification
Date: 2018-03-31 01:20:31
Message-ID: 20180331012031.ftf4vbiwabsg7uvb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-03-29 19:42:42 -0700, Peter Geoghegan wrote:
> >> + /*
> >> + * Aim for two bytes per element; this is sufficient to get a false
> >> + * positive rate below 1%, independent of the size of the bitset or total
> >> + * number of elements. Also, if rounding down the size of the bitset to
> >> + * the next lowest power of two turns out to be a significant drop, the
> >> + * false positive rate still won't exceed 2% in almost all cases.
> >> + */
> >> + bitset_bytes = Min(bloom_work_mem * 1024L, total_elems * 2);
> >> + /* Minimum allowable size is 1MB */
> >> + bitset_bytes = Max(1024L * 1024L, bitset_bytes);
> >
> > Some upcasting might be advisable, to avoid dangers of overflows?
>
> When it comes to sizing work_mem, using long literals to go from KB to
> bytes is How It's Done™. I actually think that's silly myself, because
> it's based on the assumption that long is wider than int, even though
> it isn't on Windows. But that's okay because we have the old work_mem
> size limits on Windows.

> What would the upcasting you have in mind look like?

Just use UINT64CONST()? Let's try not to introduce further code that'll
need to get painfully fixed.

>
> >> + /* Use 64-bit hashing to get two independent 32-bit hashes */
> >> + hash = DatumGetUInt64(hash_any_extended(elem, len, filter->seed));
> >
> > Hm. Is that smart given how some hash functions are defined? E.g. for
> > int8 the higher bits aren't really that independent for small values:
>
> Robert suggested that I do this. I don't think that we need to make it
> about the quality of the hash function that we have available. That
> really seems like a distinct question to me. It seems clear that this
> ought to be fine (or should be fine, if you prefer). I understand why
> you're asking about this, but it's not scalable to ask every user of a
> hash function to care that it might be a bit crap. Hash functions
> aren't supposed to be a bit crap.

Well, they're not supposed to be, but they are. Practical realities
matter... But I think it's moot because we don't use any of the bad
ones, I only got to that later...

> >> +DROP FUNCTION bt_index_check(regclass);
> >> +CREATE FUNCTION bt_index_check(index regclass,
> >> + heapallindexed boolean DEFAULT false)
> >> +RETURNS VOID
> >> +AS 'MODULE_PATHNAME', 'bt_index_check'
> >> +LANGUAGE C STRICT PARALLEL RESTRICTED;
> >
> > This breaks functions et al referencing the existing function.
>
> This sounds like a general argument against ever changing a function's
> signature. It's not like we haven't done that a number of times in
> extensions like pageinspect. Does it really matter?

Yes, it does. And imo we shouldn't.

> > Also, I
> > don't quite recall the rules, don't we have to drop the function from
> > the extension first?
>
> But...I did drop the function?

I mean in the ALTER EXTENSION ... DROP FUNCTION sense.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-03-31 01:24:11 Re: [HACKERS] Planning counters in pg_stat_statements
Previous Message Peter Geoghegan 2018-03-31 01:11:50 Re: [HACKERS] A design for amcheck heapam verification