Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location
Date: 2016-08-18 22:04:10
Message-ID: CAGTBQpYJDR7+qiFz6P7HavUvo6=zniOVEh-DGufQmDA+Wjwymg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 18, 2016 at 6:38 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Aug 18, 2016 at 2:26 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> I see that. I could try to measure average depth to measure the impact
>> this had on fan-in.
>>
>> While it should cut it in half for narrow indexes, half of very high
>> is still high. Wide indexes, which are are the ones that would suffer
>> from poor fan-in, would feel this far less, since the overhead is
>> constant.
>
> You'll also have to consider the impact on the choice of split point,
> FWIW. There is currently leeway about what page an index tuple can
> land on, if and when it happens to be equal to the high-key. I'm not
> sure how important that is, but it's a consideration.

When I read the code, it seemed generic enough, using ItemIdGetLength
(which works on non-leaf index tuples correctly, reporting the right
size), so it should still work.

But the choice of split point may make a difference for future
insertions, so I'll look into that.

>> Even if it does have an impact, I don't see an alternative, without
>> also implementing suffix truncation. Perhaps I could try to avoid
>> adding the leaf tid header if it isn't necessary, though I would have
>> to use up the last available flag bit in t_info for that.
>
> To be clear, I'm particularly worried about painting ourselves into a
> corner, suffix-truncation-wise. It's a very common optimization, that
> we should really have.

Well, page version numbers could be used to switch between
semantically similar page formats later on. So if another format is
necessary (say, prefix compression, suffix truncation, etc), it can be
changed on-the-fly on existing indexes by checking page version
numbers and doing the proper switch.

So we won't be painting ourselves into a corner, but picking the wrong
format may indeed be a headache.

I can go the way of the flag in t_info if that's preferrable. Both
would work. It's just that it's the last flag available, and that
would be painting ourselves into a corner regarding flags. To avoid
that, the flag itself could be "has extra data" (instead of has leaf
tid), and add a whole set of flags in the extra data. That could also
help for preffix compression and suffix truncation.

>>> ISTM that the way to address this problem is with a duplicate list
>>> and/or prefix compression in leaf pages.
>>
>> Prefix compression is another one I will be looking into eventually,
>> but last time I tried it was far more invasive so I abandoned until I
>> could find a less invasive way to do it.
>
> Unfortunately, sometimes it is necessary to be very ambitious in order
> to solve a problem. The understandable and usually well-reasoned
> approach of making progress as incremental as possible occasionally
> works against contributors. It's worth considering if this is such a
> case.

I'd agree if this regressed performance considerably without the other
improvements, so I guess this will depend on the fan-in measurements.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-08-18 22:06:02 errno clobbering in reorderbuffer
Previous Message Tom Lane 2016-08-18 22:01:14 Re: [PATCH] add option to pg_dumpall to exclude tables from the dump