From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: WIP: BRIN multi-range indexes |
Date: | 2021-01-22 23:27:26 |
Message-ID: | CAFBsxsEa9eN=sJaz7Px+i2Oh5TXZiW7hhxCrEMribebdtxUguQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 21, 2021 at 9:06 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:
> [wip optimizations]
> The pathological case (monotonic-asc) is now 4x faster, roughly equal to
> regular minmax index build. The opposite (monotonic-desc) is about 33%
> faster, roughly in line with btree.
Those numbers look good. I get similar results, shown below. I've read
0007-8 briefly but not in depth.
> There are a couple cases where it's actually a bit slower - those are
> the cases with very few distinct values per range. I believe this
> happens because in the batch mode the code does not check if the summary
> already contains this value, adds it to the buffer and the last step
> ends up being more expensive than this.
I think if it's worst case a bit faster than btree and best case a bit
slower than traditional minmax, that's acceptable.
> I believe there's some "compromise" between those two extremes, i.e. we
> should use buffer that is too small or too large, but something in
> between, so that the reduction happens once in a while but not too often
> (as with the original aggressive approach).
This sounds good also.
> FWIW, none of this is likely to be an issue in practice, because (a)
> tables usually don't have such strictly monotonic patterns, (b) people
> should stick to plain minmax for cases that do.
Still, it would be great if multi-minmax can be a drop in replacement. I
know there was a sticking point of a distance function not being available
on all types, but I wonder if that can be remedied or worked around somehow.
And (c) regular tables
> tend to have much wider rows, so there are fewer values per range (so
> that other stuff is likely more expensive than building BRIN).
True. I'm still puzzled that it didn't help to use 8 pages per range, but
it's moot now.
Here are some numbers (median of 3) with a similar scenario as before,
repeated here with some added details. I didn't bother with what you call
"unpatched":
btree minmax multi
monotonic-asc 44.4 26.5 27.8
mono-del-ins 38.7 24.6 30.4
mono-10-asc 61.8 25.6 33.5
create unlogged table iot (
id bigint generated by default as identity primary key,
num double precision not null,
create_dt timestamptz not null,
stuff text generated always as (md5(id::text)) stored
)
with (fillfactor = 95);
-- monotonic-asc:
insert into iot (num, create_dt)
select random(), x
from generate_series(
'2020-01-01 0:00'::timestamptz,
'2020-01-01 0:00'::timestamptz +'5 years'::interval,
'1 second'::interval) x;
-- mono-del-ins:
-- Here I deleted a few values from (likely) each page in the above table,
and reinserted values that shouldn't be in existing ranges:
delete from iot1
where num < 0.05
or num > 0.95;
vacuum iot1;
insert into iot (num, create_dt)
select random(), x
from generate_series(
'2020-01-01 0:00'::timestamptz,
'2020-02-01 23:59'::timestamptz,
'1 second'::interval) x;
-- mono-10-asc
truncate table iot;
insert into iot (num, create_dt)
select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || '
seconds')::interval
from generate_series(1,5*365*24*60*60) x;
--
John Naylor
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Finnerty, Jim | 2021-01-22 23:46:13 | Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)? |
Previous Message | Peter Smith | 2021-01-22 23:25:19 | Re: Single transaction in the tablesync worker? |