Re: constraint exclusion and nulls in IN (..) clause

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: emre(at)hasegeli(dot)com
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: constraint exclusion and nulls in IN (..) clause
Date: 2018-03-06 04:43:11
Message-ID: 144545f0-ed7c-7cb3-17b8-908e1a3a092f@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

Thanks for reviewing again.

On 2018/03/05 23:04, Emre Hasegeli wrote:
>>> Shouldn't we check if we consumed all elements (state->next_elem >=
>>> state->num_elems) inside the while loop?
>>
>> You're right. Fixed.
>
> I don't think the fix is correct. arrayconst_next_fn() can still
> execute state->next_elem++ without checking if we consumed all
> elements. I couldn't manage to crash it though.

I couldn't get it to crash either, but it's wrong anyway. What happens is
the calling code would perform constraint exclusion with a garbage clause
(that is one containing a garbage value picked from elem_values when
next_elem has exceeded num_elems) and that may alter the result of
partition pruning. Verified that with the following example.

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);

-- before fixing
explain (costs off) select * from p where a in (2, null);
QUERY PLAN
---------------------------------------------------
Append
-> Seq Scan on p1
Filter: (a = ANY ('{2,NULL}'::integer[]))
-> Seq Scan on p2
Filter: (a = ANY ('{2,NULL}'::integer[]))
(5 rows)

So, it looks as if the proposed patch has no effect at all.

-- after fixing
explain (costs off) select * from p where a in (2, null);
QUERY PLAN
----------------------------------------------------------
Append
-> Seq Scan on p2
Filter: (a = ANY ('{2,NULL}'::integer[]))

Was a bit surprised that the calling code didn't crash while handling a
garbage clause.

> I am also not sure if it is proper to skip some items inside the
> "next_fn", but I don't know the code to suggest a better alternative.

Patch teaches it to ignore nulls when it's known that the operator being
used is strict. It is harmless and has the benefit that constraint
exclusion gives an answer that is consistent with what actually running
such a qual against a table's rows would do.

Consider this example.

create table foo (a int check (a = 1));

insert into foo values (1), (null);

select * from foo where a in (2, null);
a
---
(0 rows)

set constraint_exclusion to on;

-- you'd think constraint exclusion would avoid the scan
explain select * from foo where a in (2, null);
QUERY PLAN
-----------------------------------------------------
Seq Scan on foo (cost=0.00..41.88 rows=13 width=4)
Filter: (a = ANY ('{2,NULL}'::integer[]))
(2 rows)

But it didn't, because the null in that list caused constraint exclusion
to mistakenly decide that the clause as a whole doesn't refute the
constraint (check (a = 1)).

-- with the patch
explain select * from foo where a in (2, null);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=4)
One-Time Filter: false
(2 rows)

After ignoring null, only (a = 2) is left to consider and that does refute
(a = 1), so pruning works as desired.

BTW, if you compose the qual using an OR directly, it works as desired
even with HEAD:

explain select * from foo where a = 2 or a = null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=4)
One-Time Filter: false
(2 rows)

That's because a = null is const-simplified in an earlier planning phase
to false, so (a = 2 or false) reduces to just a = 2, which does refute
foo's constraint (a = 1). I think the same thing should happen when
constraint exclusion internally turns an IN (..) clause into a set of OR
clauses. Skipping nulls in these next_fn functions is, in a way, same as
const-simplifying them away.

>> + /* skip nulls if ok to do so */
>> + if (state->opisstrict && state->next)
>
> Do we need && state->next in here? It is checked for (state->next ==
> NULL) 2 lines above.

Yeah, it wasn't needed.

>> + {
>> + Node *node = (Node *) lfirst(state->next);
>> +
>> + while (node && IsA(node, Const) && ((Const *) node)->constisnull)
>
> Should we spell the first check like node != NULL as the rest of the code does.

OK.

>> + {
>> + state->next = lnext(state->next);
>> + if (state->next)
>> + node = (Node *) lfirst(state->next);
>> + }
>> + }
>
> I think this function is also not correct as it can call
> lnext(state->next) without checking.

Yeah. Rearranged the code to fix that.

Attached updated patch. Thanks again.

Regards,
Amit

Attachment Content-Type Size
v4-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch text/plain 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-03-06 04:45:00 Re: Faster inserts with mostly-monotonically increasing values
Previous Message Pavan Deolasee 2018-03-06 04:40:04 Re: Faster inserts with mostly-monotonically increasing values