|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>|
|Cc:||Mark Dilger <hornschnorter(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers(at)postgreSQL(dot)org|
|Subject:||Re: WIP: BRIN multi-range indexes|
|Views:||Raw Message | Whole Thread | Download mbox|
Attached is a patch version fixing breakage due to pg_proc changes
commited in fd1a421fe661.
On 03/02/2018 05:08 AM, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2018-02-25 01:30:47 +0100, Tomas Vondra wrote:
>>> Note: Currently, this only works with float8-based data types.
>>> Supporting additional data types is not a big issue, but will
>>> require extending the opclass with "subtract" operator (used to
>>> compute distance between values when merging ranges).
>> Based on Tom's past stances I'm a bit doubtful he'd be happy with
>> such a restriction. Note that something similar-ish also has come
>> up in 0a459cec96.
>> I kinda wonder if there's any way to not have two similar but not
>> equal types of logic here?
I don't think it's very similar to what 0a459cec96 is doing. It's true
both deal with ranges of values, but that's about it - I don't see how
this patch could reuse some bits from 0a459cec96.
To elaborate, 0a459cec96 only really needs to know "does this value fall
into this range" while this patch needs to compare ranges by length.
That is, given a bunch of ranges (summary of values for a section of a
table), it needs to decide which ranges to merge - and it picks the
ranges with the smallest gap.
So for example with ranges [1,10], [15,20], [30,200], [250,300] it would
merge [1,10] and [15,20] because the gap between them is only 5, which
is shorter than the other gaps. This is used when the summary for a
range of pages gets "full" (the patch only keeps up to 32 ranges or so).
Not sure how I could reuse 0a459cec96 to do this.
> Hm. I wonder what the patch intends to do with subtraction overflow,
> or infinities, or NaNs. Just as with the RANGE patch, it does not
> seem to me that failure is really an acceptable option. Indexes are
> supposed to be able to index whatever the column datatype can store.
I've been thinking about this after looking at 0a459cec96, and I don't
think this patch has the same issues. One reason is that just like the
original minmax opclass, it does not really mess with the data it
stores. It only does min/max on the values, and stores that, so if there
was NaN or Infinity, it will index NaN or Infinity.
The subtraction is used only to decide which ranges to merge first, and
if the subtraction returns Infinity/NaN the ranges will be considered
very distant and merged last. Which is pretty much the desired behavior,
because it means -Infinity, Infinity and NaN will be keps as individual
"points" as long as possible.
Perhaps there is some other danger/thinko here, that I don't see?
The one overflow issue I found in the patch is that the numeric
"distance" function does this:
d = DirectFunctionCall2(numeric_sub, a2, a1); /* a2 - a1 */
which can overflow, of course. But that is not fatal - the index may get
inefficient due to non-optimal merging of ranges, but it will still
return correct results. But I think this can be easily improved by
passing not only the two values, but also minimum and maximum, and use
that to normalize the values to [0,1].
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Andres Freund||2018-03-04 00:14:09||Re: PATCH: pgbench - option to build using ppoll() for larger connection counts|
|Previous Message||Andres Freund||2018-03-04 00:12:52||Re: BUG #14941: Vacuum crashes|