Re: [PATCH] Erase the distinctClause if the result is unique by definition

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date: 2020-02-11 09:17:15
Message-ID: CAKU4AWqzHG-4X4R85BcwD+QDD6u6OBOj65F69FBToPK4KUo0rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 11, 2020 at 3:56 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:

> On Tue, Feb 11, 2020 at 10:57:26AM +0800, Andy Fan wrote:
> > On Tue, Feb 11, 2020 at 12:22 AM Ashutosh Bapat <
> > ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > > I forgot to mention this in the last round of comments. Your patch was
> > > actually removing distictClause from the Query structure. Please avoid
> > > doing that. If you remove it, you are also removing the evidence that
> this
> > > Query had a DISTINCT clause in it.
> > >
> >
> > Yes, I removed it because it is the easiest way to do it. what is the
> > purpose of keeping the evidence?
> >
> > >> However the patch as presented has some problems
> > >> 1. What happens if the primary key constraint or NOT NULL constraint
> gets
> > >> dropped between a prepare and execute? The plan will no more be valid
> and
> > >> thus execution may produce non-distinct results.
> >
> > > But that doesn't matter since a query can be prepared outside a
> > > transaction and executed within one or more subsequent transactions.
> > >
> >
> > Suppose after a DDL, the prepared statement need to be re-parsed/planned
> > if it is not executed or it will prevent the DDL to happen.
> >
> > The following is my test.
> >
> > postgres=# create table t (a int primary key, b int not null, c int);
> > CREATE TABLE
> > postgres=# insert into t values(1, 1, 1), (2, 2, 2);
> > INSERT 0 2
> > postgres=# create unique index t_idx1 on t(b);
> > CREATE INDEX
> >
> > postgres=# prepare st as select distinct b from t where c = $1;
> > PREPARE
> > postgres=# explain execute st(1);
> > QUERY PLAN
> > -------------------------------------------------
> > Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> > Filter: (c = 1)
> > (2 rows)
> > ...
> > postgres=# explain execute st(1);
> > QUERY PLAN
> > -------------------------------------------------
> > Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> > Filter: (c = $1)
> > (2 rows)
> >
> > -- session 2
> > postgres=# alter table t alter column b drop not null;
> > ALTER TABLE
> >
> > -- session 1:
> > postgres=# explain execute st(1);
> > QUERY PLAN
> > -------------------------------------------------------------
> > Unique (cost=1.03..1.04 rows=1 width=4)
> > -> Sort (cost=1.03..1.04 rows=1 width=4)
> > Sort Key: b
> > -> Seq Scan on t (cost=0.00..1.02 rows=1 width=4)
> > Filter: (c = $1)
> > (5 rows)
> >
> > -- session 2
> > postgres=# insert into t values (3, null, 3), (4, null, 3);
> > INSERT 0 2
> >
> > -- session 1
> > postgres=# execute st(3);
> > b
> > ---
> >
> > (1 row)
> >
> > and if we prepare sql outside a transaction, and execute it in the
> > transaction, the other session can't drop the constraint until the
> > transaction is ended.
>
> And what if you create a view on top of a query containing a distinct
> clause
> rather than using prepared statements? FWIW your patch doesn't handle such
> case at all, without even needing to drop constraints:

>
CREATE TABLE t (a int primary key, b int not null, c int);
> INSERT INTO t VALUEs(1, 1, 1), (2, 2, 2);
> CREATE UNIQUE INDEX t_idx1 on t(b);
> CREATE VIEW v1 AS SELECT DISTINCT b FROM t;
> EXPLAIN SELECT * FROM v1;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
>
Thanks for pointing it out. This is unexpected based on my current
knowledge, I
will check that.

> I also think this is not the right way to handle this optimization.
>

I started to check query_is_distinct_for when Tom point it out, but still
doesn't
understand the context fully. I will take your finding with this as well.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2020-02-11 09:18:04 Re: Internal key management system
Previous Message Peter Eisentraut 2020-02-11 08:22:54 client-side fsync() error handling