Re: Feature request for adoptive indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Hayk Manukyan <manukyantt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Feature request for adoptive indexes
Date: 2021-11-01 20:06:15
Message-ID: CA+Tgmoas=YhvX7uSNnSt27wxZwGD2RfZd7hNLZ166YeFzO7MBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 26, 2021 at 11:11 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> If I get what you propose, you want to have a "top" tree for (job, nlp,
> year), which "splits" the data set into subsets of ~5000-7000 rows. And
> then for each subset you want a separate "small" trees on each of the
> other columns, so in this case three trees.
>
> Well, the problem with this is pretty obvious - each of the small trees
> requires separate copies of the leaf pages. And remember, in a btree the
> internal pages are usually less than 1% of the index, so this pretty
> much triples the size of the index. And if you insert a row into the
> index, it has to insert the item pointer into each of the small trees,
> likely requiring a separate I/O for each.
>
> So I'd bet this is not any different from just having three separate
> indexes - it doesn't save space, doesn't save I/O, nothing.

I agree. In a lot of cases, it's actually useful to define the index
on fewer columns, like (job, nlp, year) or even just (job, nlp) or
even just (job) because it makes the index so much smaller and that's
pretty important. If you have enough duplicate entries in a (job, nlp,
year) index to justify create a (job, nlp, year, sequence) index, the
number of distinct (job, nlp, year) tuples has to be small compared to
the number of (job, nlp, year, sequence) tuples - and that means that
you wouldn't actually save much by combining your (job, nlp, year,
sequence) index with a (job, nlp, year, other-stuff) index. As you
say, the internal pages aren't the problem.

I don't intend to say that there's no possible use for this kind of
technology. Peter G. says that people are writing research papers
about that and they probably wouldn't be doing that unless they'd
found some case where it's a big win. But this example seems extremely
unconvincing.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2021-11-01 20:06:20 Re: Portability report: ninja
Previous Message Stephen Frost 2021-11-01 19:44:15 Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)