Re: Feature request for adoptive indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Hayk Manukyan <manukyantt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Feature request for adoptive indexes
Date: 2021-10-26 15:08:59
Message-ID: da00cb7f-1b1c-57cc-dd92-c78dd6e5afa6@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/26/21 8:49 AM, Hayk Manukyan wrote:
> ok. here is the deal if I have the following index with 6 column
>
> CREATE INDEX ON job_nlp_year_scan (job, nlp, year, scan_id, issue_flag,
> sequence);
>
> I need to specify all 6 columns in where clause in order to fully use
> this index.

What do you mean by "fully use this index"? Yes, the query may use just
some of the columns and there will be a bit of overhead, but I doubt
it'll be measurable.

> It will not be efficient in cases when I have 4 condition in where
> clause also I should follow the order of columns.

So, do some experiments and show us what the difference is. Create an
index on the 4 and 6 columns, and measure timings for a query with just
the 4 columns.

> In case of INCLUDE the 3 columns just will be in index but will not be
> structured as index so it will have affect only if In select I will have
> that 6 columns nothing more.
>
> In my case I have table with ~15 columns
> In my application  I have to do a lot of queries with following where
> clauses 
>
> 1. where  job = <something> and nlp = <something> and year = <something>
> and SCAN_ID = <something>
> 2. where  job = <something> and nlp = <something> and year = <something>
> and ISSUE_FLAG = <something>
> 3. where  job = <something> and nlp = <something> and year = <something>
> and SEQUENCE = <something>
>
> I don't want to index just on  job, nlp, year because for each  job,
> nlp, year I have approximately 5000-7000 rows ,
> overall table have ~50m rows so it is partitioned by job as well.  So if
> I build 3 separate indexes it will be huge resource.
> So I am thinking of having one index which will be job, nlp, year and
> the 4-th layer will be other columns not just included but also in
> B-tree structure. 
> To visualize it will be something like this:
> image.png
> The red part is ordinary index with nested b-trees ant the yellow part
> is adaptive part so depends on
> where clause optimizer can decide which direction (leaf, b-tree
> whatever) to chose.
> In this case I will have one index and will manage red part only once
> for all three cases.
> Those it make sense ?

If I get what you propose, you want to have a "top" tree for (job, nlp,
year), which "splits" the data set into subsets of ~5000-7000 rows. And
then for each subset you want a separate "small" trees on each of the
other columns, so in this case three trees.

Well, the problem with this is pretty obvious - each of the small trees
requires separate copies of the leaf pages. And remember, in a btree the
internal pages are usually less than 1% of the index, so this pretty
much triples the size of the index. And if you insert a row into the
index, it has to insert the item pointer into each of the small trees,
likely requiring a separate I/O for each.

So I'd bet this is not any different from just having three separate
indexes - it doesn't save space, doesn't save I/O, nothing.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-10-26 15:12:47 Re: CREATEROLE and role ownership hierarchies
Previous Message Robert Haas 2021-10-26 15:03:44 Re: Refactoring: join MakeSingleTupleTableSlot() and MakeTupleTableSlot()