Quick Links

Re: Feature request for adoptive indexes

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Hayk Manukyan <manukyantt(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Feature request for adoptive indexes
Date:	2021-10-26 15:08:59
Message-ID:	da00cb7f-1b1c-57cc-dd92-c78dd6e5afa6@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/26/21 8:49 AM, Hayk Manukyan wrote:
> ok. here is the deal if I have the following index with 6 column
>
> CREATE INDEX ON job_nlp_year_scan (job, nlp, year, scan_id, issue_flag,
> sequence);
>
> I need to specify all 6 columns in where clause in order to fully use
> this index.

What do you mean by "fully use this index"? Yes, the query may use just
some of the columns and there will be a bit of overhead, but I doubt
it'll be measurable.

> It will not be efficient in cases when I have 4 condition in where
> clause also I should follow the order of columns.

So, do some experiments and show us what the difference is. Create an
index on the 4 and 6 columns, and measure timings for a query with just
the 4 columns.

> In case of INCLUDE the 3 columns just will be in index but will not be
> structured as index so it will have affect only if In select I will have
> that 6 columns nothing more.
>
> In my case I have table with ~15 columns
> In my application I have to do a lot of queries with following where
> clauses
>
> 1. where job = <something> and nlp = <something> and year = <something>
> and SCAN_ID = <something>
> 2. where job = <something> and nlp = <something> and year = <something>
> and ISSUE_FLAG = <something>
> 3. where job = <something> and nlp = <something> and year = <something>
> and SEQUENCE = <something>
>
> I don't want to index just on job, nlp, year because for each job,
> nlp, year I have approximately 5000-7000 rows ,
> overall table have ~50m rows so it is partitioned by job as well. So if
> I build 3 separate indexes it will be huge resource.
> So I am thinking of having one index which will be job, nlp, year and
> the 4-th layer will be other columns not just included but also in
> B-tree structure.
> To visualize it will be something like this:
> image.png
> The red part is ordinary index with nested b-trees ant the yellow part
> is adaptive part so depends on
> where clause optimizer can decide which direction (leaf, b-tree
> whatever) to chose.
> In this case I will have one index and will manage red part only once
> for all three cases.
> Those it make sense ?

If I get what you propose, you want to have a "top" tree for (job, nlp,
year), which "splits" the data set into subsets of ~5000-7000 rows. And
then for each subset you want a separate "small" trees on each of the
other columns, so in this case three trees.

Well, the problem with this is pretty obvious - each of the small trees
requires separate copies of the leaf pages. And remember, in a btree the
internal pages are usually less than 1% of the index, so this pretty
much triples the size of the index. And if you insert a row into the
index, it has to insert the item pointer into each of the small trees,
likely requiring a separate I/O for each.

So I'd bet this is not any different from just having three separate
indexes - it doesn't save space, doesn't save I/O, nothing.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Feature request for adoptive indexes at 2021-10-26 06:49:31 from Hayk Manukyan

Responses

Re: Feature request for adoptive indexes at 2021-10-26 19:39:20 from Pavel Borisov
Re: Feature request for adoptive indexes at 2021-11-01 20:06:15 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Dilger	2021-10-26 15:12:47	Re: CREATEROLE and role ownership hierarchies
Previous Message	Robert Haas	2021-10-26 15:03:44	Re: Refactoring: join MakeSingleTupleTableSlot() and MakeTupleTableSlot()