Re: Feature request for adoptive indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Hayk Manukyan <manukyantt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Feature request for adoptive indexes
Date: 2021-11-01 21:15:59
Message-ID: a2183996-8eba-70be-c121-f76810717308@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/1/21 21:06, Robert Haas wrote:
> On Tue, Oct 26, 2021 at 11:11 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> If I get what you propose, you want to have a "top" tree for (job, nlp,
>> year), which "splits" the data set into subsets of ~5000-7000 rows. And
>> then for each subset you want a separate "small" trees on each of the
>> other columns, so in this case three trees.
>>
>> Well, the problem with this is pretty obvious - each of the small trees
>> requires separate copies of the leaf pages. And remember, in a btree the
>> internal pages are usually less than 1% of the index, so this pretty
>> much triples the size of the index. And if you insert a row into the
>> index, it has to insert the item pointer into each of the small trees,
>> likely requiring a separate I/O for each.
>>
>> So I'd bet this is not any different from just having three separate
>> indexes - it doesn't save space, doesn't save I/O, nothing.
>
> I agree. In a lot of cases, it's actually useful to define the index
> on fewer columns, like (job, nlp, year) or even just (job, nlp) or
> even just (job) because it makes the index so much smaller and that's
> pretty important. If you have enough duplicate entries in a (job, nlp,
> year) index to justify create a (job, nlp, year, sequence) index, the
> number of distinct (job, nlp, year) tuples has to be small compared to
> the number of (job, nlp, year, sequence) tuples - and that means that
> you wouldn't actually save much by combining your (job, nlp, year,
> sequence) index with a (job, nlp, year, other-stuff) index. As you
> say, the internal pages aren't the problem.
>
> I don't intend to say that there's no possible use for this kind of
> technology. Peter G. says that people are writing research papers
> about that and they probably wouldn't be doing that unless they'd
> found some case where it's a big win. But this example seems extremely
> unconvincing.
>

I actually looked at the use case mentioned by Peter G, i.e. merged
indexes with master-detail clustering (see e.g. [1]), but that seems
like a rather different thing. The master-detail refers to storing rows
from multiple tables, interleaved in a way that allows faster joins. So
it's essentially a denormalization tool.

Perhaps there's something we could learn about efficient storage of the
small trees, but I haven't found any papers describing that (I haven't
spent much time on the search, though).

[1] Algorithms for merged indexes, Goetz Graefe
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.7709

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilya Gladyshev 2021-11-01 21:31:35 Re: Partial aggregates pushdown
Previous Message Stephen Frost 2021-11-01 21:05:22 Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)