Re: Feature request for adoptive indexes

From: Hayk Manukyan <manukyantt(at)gmail(dot)com>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Feature request for adoptive indexes
Date: 2021-11-05 11:17:49
Message-ID: CAF+kZOEgw0pKYcyC2z0RSgEE9UHAqOL+5NvLjioog_KZCNxq=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All

I did final research and saw that the difference between best and worst
cases is indeed really small.
I want to thank you guys for your time and efforts.

Best regards.

вт, 2 нояб. 2021 г. в 18:04, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>:

> вт, 2 нояб. 2021 г. в 16:04, Hayk Manukyan <manukyantt(at)gmail(dot)com>:
>
>> Tomas Vondra
>> > Are you suggesting those are not the actual best/worst cases and we
>> > should use some other indexes? If yes, which ones?
>>
>> I would say yes.
>> In my case I am not querying only sequence column.
>> I have the following cases which I want to optimize.
>> 1. Select * from Some_table where job = <somthing> and nlp = <something>
>> and year = <something> and *scan_id = <something> *
>> 2. Select * from Some_table where job = <somthing> and nlp = <something>
>> and year = <something> and *Issue_flag = <something> *
>> 3. Select * from Some_table where job = <somthing> and nlp = <something>
>> and year = <something> and *sequence = <something> *
>> Those are queries that my app send to db that is why I said that from *read
>> perspective* our *best case* is 3 separate indexes for
>> *(job, nlp, year, sequence)* AND *(job, nlp, year, Scan_ID)* and *(job,
>> nlp, year, issue_flag)* and any other solution like
>> (job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year )
>> INCLUDE(sequence, Scan_ID, issue_flag) OR just (job, nlp, year) can be
>> considered as* worst case *
>> I will remind that in real world scenario I have ~50m rows and about *~5k
>> rows for each (job, nlp, year )*
>>
>
> So you get 50M rows /5K rows = 10K times selectivity, when you select on
> job = <somthing> and nlp = <something> and year = <something> which is
> enormous. Then you should select some of the 5K rows left, which is
> expected to be pretty fast on bitmap index scan or INCLUDE column
> filtering. It confirms Tomas's experiment
>
> pgbench -n -f q4.sql -T 60
>
> 106 ms vs 109 ms
>
> fits your case pretty well. You get absolutely negligible difference
> between best and worst case and certainly you don't need anything more than
> just plain index for 3 columns, you even don't need INCLUDE index.
>
> From what I read I suppose that this feature indeed doesn't based on the
> real need. If you suppose it is useful please feel free to make and post
> here some measurements that proves your point.
>
>
>
>
> --
> Best regards,
> Pavel Borisov
>
> Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Isaac Morland 2021-11-05 12:03:58 Re: [PATCH] rename column if exists
Previous Message Andrey Borodin 2021-11-05 10:26:09 Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?