Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-09-28 04:32:21
Message-ID: CAApHDvq5S41QAAr7_WtKyZvVuFkBfzi_+NuLJc80S9f5=Nqc1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 26, 2014 at 12:36 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 09/16/2014 01:20 PM, David Rowley wrote:
>
>> + /*
>> + * We mustn't allow any joins to be removed if there are any
>> pending
>> + * foreign key triggers in the queue. This could happen if we are
>> planning
>> + * a query that has been executed from within a volatile function
>> and the
>> + * query which called this volatile function has made some
>> changes to a
>> + * table referenced by a foreign key. The reason for this is that
>> any
>> + * updates to a table which is referenced by a foreign key
>> constraint will
>> + * only have the referencing tables updated after the command is
>> complete,
>> + * so there is a window of time where records may violate the
>> foreign key
>> + * constraint.
>> + *
>> + * Currently this code is quite naive, as we won't even attempt
>> to remove
>> + * the join if there are *any* pending foreign key triggers, on
>> any
>> + * relation. It may be worthwhile to improve this to check if
>> there's any
>> + * pending triggers for the referencing relation in the join.
>> + */
>> + if (!AfterTriggerQueueIsEmpty())
>> + return false;
>>
>
>
Hi Heikki,

Thanks for having a look at the patch.

Hmm. This code runs when the query is planned. There is no guarantee that
> there won't be after triggers pending when the query is later *executed*.
>
>
Please correct anything that sounds wrong here, but my understanding is
that we'll always plan a query right before we execute it, with the
exception of PREPARE statements where PostgreSQL will cache the query plan
when the prepare statement is first executed. So I think you may have a
point here regarding PREPARE'd statements, but I think that it is isolated
to those.

I think in all other cases we'll plan right before we execute. So if we
happen to be planning an UPDATE statement which has a sub-query that
perform some INNER JOINs, I think we're safe to remove INNER JOINs when
possible, as the UPDATE statement won't get visibility of its own changes.

We can see that here:

create table updatetest (id int primary key, value int, value2 int);

create or replace function getvalue2(p_id int) returns int
as $$select value2 from updatetest where id = p_id$$
language sql volatile;

insert into updatetest values(0,0,0);
insert into updatetest values(1,10,10);
insert into updatetest values(2,20,20);
insert into updatetest values(3,30,30);

update updatetest set value = COALESCE((select value from updatetest u2
where updatetest.id - 1 = u2.id) + 1,0);

update updatetest set value2 = COALESCE(getvalue2(id - 1) + 1,0);

select * from updatetest;
id | value | value2
----+-------+--------
0 | 0 | 0
1 | 1 | 1
2 | 11 | 2
3 | 21 | 3

The value column appears to have been set based on the value that was
previously in the value column, and has not come from the newly set value.
The behaviour is different for the value2 column as the value for this has
been fetched from another query, which *does* see the newly updated value
stored in the value2 column.

My understanding of foreign keys is that any pending foreign key triggers
will be executed just before the query completes, so we should only ever
encounter pending foreign key triggers during planning when we're planning
a query that's being executed from somewhere like a volatile function or
trigger function, if the outer query has updated or deleted some records
which are referenced by a foreign key.

So I think with the check for pending triggers at planning time this is
safe at least for queries being planned right before they're executed, but
you've caused me to realise that I'll probably need to do some more work on
this for when it comes to PREPARE'd queries, as it looks like if we
executed a prepared query from inside a volatile function or trigger
function that was called from a DELETE or UPDATE statement that caused
foreign key triggers to be queued, and we'd happened to have removed some
INNER JOINs when we originally planned that prepare statement, then that
would be wrong.

The only thing that comes to mind to fix that right now is to tag something
maybe in PlannerInfo to say if we've removed any INNER JOINs in planning,
then when we execute a prepared statement we could void the cached plan we
see that some INNER JOINs were removed, but only do this if the foreign key
trigger queue has pending triggers. (which will hopefully not be very
often).

Another thing that comes to mind which may be similar is how we handle
something like:

PREPARE a AS SELECT * from tbl WHERE name LIKE $1;

Where, if $1 is 'foo' or 'foo%' we might want to use an index scan, but if
$1 was '%foo' then we'd probably not.
I've not yet looked into great detail of what happens here, but from some
quick simple tests it seems to replan each time!? But perhaps that'd due to
the parameter, where with my other tests the PREPARE statement had no
parameters.

There was some other discussion relating to some of this over here->
http://www.postgresql.org/message-id/20140603235053.GA351732@tornado.leadboat.com

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-09-28 04:36:51 Missing newlines in verbose logs of pg_dump, introduced by RLS patch
Previous Message Stephen Frost 2014-09-28 03:58:27 Re: json (b) and null fields