|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>|
|Cc:||John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: WIP: BRIN multi-range indexes|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Thu, Sep 10, 2020 at 05:01:37PM -0300, Alvaro Herrera wrote:
>On 2020-Sep-10, Tomas Vondra wrote:
>> I've spent a bit of time experimenting with this. My idea was to allow
>> keeping an "expanded" version of the summary somewhere. As the addValue
>> function only receives BrinValues I guess one option would be to just
>> add bv_mem_values field. Or do you have a better idea?
>Maybe it's okay to pass the BrinMemTuple to the add_value function, and
>keep something there. Or maybe that's pointless and just a new field in
>BrinValues is okay.
OK. I don't like changing the add_value API very much, so for the
experimental version I simply added three new fields into the BrinValues
struct - the deserialized value, serialization callback and the memory
context. This seems to be working good enough for a WIP version.
With the original (non-batched) patch version a build of an index took
about 4s. With the minmax_multi_get_strategy_procinfo optimization and
batch build it now takes ~2.6s, which is quite nice. It's still ~2.5x as
much compared to plain minmax though.
I think there's still room for a bit more improvement (in how we merge
the ranges etc.) and maybe we can get to ~2s or something like that.
>> Of course, more would need to be done:
>> 1) We'd need to also pass the right memory context (bt_context seems
>> like the right thing, but that's not something addValue sees now).
>You could use GetMemoryChunkContext() for that.
Maybe, although I prefer to just pass the memory context explicitly.
>> 2) We'd also need to specify some sort of callback that serializes the
>> in-memory value into bt_values. That's not something addValue can do,
>> because it doesn't know whether it's the last value in the range etc. I
>> guess one option would be to add yet another support proc, but I guess a
>> simple callback would be enough.
I added a simple serialization callback. It works but it's a bit weird
that twe have most functions defined as support procedures, and then
this extra C callback.
>> I've hacked together an experimental version of this to see how much
>> would it help, and it reduces the duration from ~4.6s to ~3.3s. Which is
>> nice, but plain minmax is ~1.1s. I suppose there's room for further
>> improvements in compare_combine_ranges/reduce_combine_ranges and so on,
>> but I still think there'll always be a gap compared to plain minmax.
>The main reason I'm talking about desupporting plain minmax is that,
>even if it's amazingly fast, it loses quite quickly in real-world cases
>because of loss of correlation. Minmax's build time is pretty much
>determined by speed at which you can seqscan the table. I don't think
>we lose much if we add overhead in order to create an index that is 100x
I understand. I just feel a bit uneasy about replacing an index with
something that may or may not be better for a certain use case. I mean,
if you have data set for which regular minmax works fine, wouldn't you
be annoyed if we just switched it for something slower?
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Amit Langote||2020-09-11 10:20:56||Re: making update/delete of inheritance trees scale better|
|Previous Messagefirstname.lastname@example.org||2020-09-11 09:39:12||RE: Implement UNLOGGED clause for COPY FROM|