Re: Optimization outcome depends on the index order

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimization outcome depends on the index order
Date: 2023-12-22 09:48:06
Message-ID: CAPpHfdsE+v+mDoX2OF-wBTR4eiyiB3urwMEJUPbSwjJiG-P=Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 22, 2023 at 8:53 AM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:
> On 21/12/2023 12:10, Alexander Korotkov wrote:
> > I took a closer look at the patch in [9]. I should drop my argument
> > about breaking the model, because add_path() already considers other
> > aspects than just costs. But I have two more note about that patch:
> >
> > 1) It seems that you're determining the fact that the index path
> > should return strictly one row by checking path->rows <= 1.0 and
> > indexinfo->unique. Is it really guaranteed that in this case quals
> > are matching unique constraint? path->rows <= 1.0 could be just an
> > estimation error. Or one row could be correctly estimated, but it's
> > going to be selected by some quals matching unique constraint and
> > other quals in recheck. So, it seems there is a risk to select
> > suboptimal index due to this condition.
>
> Operating inside the optimizer, we consider all estimations to be the
> sooth. This patch modifies only one place: having two equal assumptions,
> we just choose one that generally looks more stable.
> Filtered tuples should be calculated and included in the cost of the
> path. The decision on the equality of paths has been made in view of the
> estimation of these filtered tuples.

Even if estimates are accurate the conditions in the patch doesn't
guarantee there is actually a unique condition.

# create table t as select i/1000 a, i % 1000 b, i % 1000 c from
generate_series(1,1000000) i;
# create unique index t_unique_idx on t(a,b);
# create index t_another_idx on t(a,c);
# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
c | integer | | |
Indexes:
"t_another_idx" btree (a, c)
"t_unique_idx" UNIQUE, btree (a, b)
# set enable_bitmapscan = off; explain select * from t where a = 1 and c =
1;
SET
Time: 0.459 ms
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using t_unique_idx on t (cost=0.42..1635.16 rows=1 width=12)
Index Cond: (a = 1)
Filter: (c = 1)
(3 rows)

> > 2) Even for non-unique indexes this patch is putting new logic on top
> > of the subsequent code. How we can prove it's going to be a win?
> > That could lead, for instance, to dropping parallel-safe paths in
> > cases we didn't do so before.
> Because we must trust all predictions made by the planner, we just
> choose the most trustworthy path. According to the planner logic, it is
> a path with a smaller selectivity. We can make mistakes anyway just
> because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not clear
why this should be on top of other logic later in add_path().

> > Anyway, please start a separate thread if you're willing to put more
> > work into this.
>
> Done

Thanks.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2023-12-22 10:32:21 Re: Synchronizing slots from primary to standby
Previous Message Junwang Zhao 2023-12-22 09:44:20 Re: [meson] expose buildtype debug/optimization info to pg_config