BUG #8591: Erroneous results, planner pushing where into left join right side

From: klaussfreire(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #8591: Erroneous results, planner pushing where into left join right side
Date: 2013-11-13 23:41:37
Message-ID: E1Vgk41-00050x-Ck@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 8591
Logged by: Claudio Freire
Email address: klaussfreire(at)gmail(dot)com
PostgreSQL version: 9.2.5
Operating system: Amazon Linux
Description:

So, I've been checking whether my SQL was wrong, and I really can't see the
fault in it. It seems to be a planner error, generating erroneous results.

I've got this query:

select ag.*, act.* from ag
left join act on act.id = ag.act_id
left join camp on camp.id = act.camp_id
left join adv on adv.id = ag.adv_id
left join src on src.id = coalesce(ag.src_id, act.src_id)
where
((ag.act_id in
(2566,2567,1928,2093,2320,2450,2451,2452,1965,2353,2226,2365,2370,2373,2145,2255,2387,2388,2261,2262,2392,2319,1885,2014,2272,1936,1894,1895,1896,1897,1898,2540,1903,1904,2171,1918))
and (ag.created <= '2013-10-16 23:59:59.999'::timestamp without time zone)
and (ag.created >= '2013-10-16'::timestamp without time zone))
and (coalesce(ag.src_id, act.src_id) = 74)
limit 1000;

Which returns 423 rows, many of which don't match the where condition.

Explain analyze here: http://explain.depesz.com/s/6Ov (notice the filter on
the scan over act_pkey, that's wrong since the join node is a left join)

If written like this, it returns the right 34 results (the -1000 there is a
no-op, there's no such value anywhere):

select ag.*, act.* from ag
left join act on act.id = ag.act_id
left join camp on camp.id = act.camp_id
left join adv on adv.id = ag.adv_id
left join src on src.id = coalesce(ag.src_id, act.src_id)
where
((ag.act_id in
(2566,2567,1928,2093,2320,2450,2451,2452,1965,2353,2226,2365,2370,2373,2145,2255,2387,2388,2261,2262,2392,2319,1885,2014,2272,1936,1894,1895,1896,1897,1898,2540,1903,1904,2171,1918))
and (ag.created <= '2013-10-16 23:59:59.999'::timestamp without time zone)
and (ag.created >= '2013-10-16'::timestamp without time zone))
and (coalesce(ag.src_id, act.src_id, -1000) = 74)
limit 1000;

Explain analyze here: http://explain.depesz.com/s/Xsa (notice the filter is
not on the join node, which is the correct way to execute this query)

A good plan can also be obtained by removing the left join against src. In
this query, this join is superfluous, but the (more complex) real-world
query that is giving me this bad plan needs it.

There are two relevant indices:

CREATE INDEX ix_ag_action ON ag USING btree (ruby_action_id);

CREATE UNIQUE INDEX ix_ag_unq ON ag USING btree (created,
(COALESCE(act_id, (-1000))), (COALESCE(adv_id, (-1000))),
(COALESCE(country_id, (-1000))), (COALESCE(os_id, (-1000))),
(COALESCE(src_id, (-1000))));

Sadly, I could not reproduce this bug on a generated database. It seems to
be data-depepdent (probably depends on specific stats in order to generate
that plan). I've got a script that generates a close-enough database, but it
fails to reproduce the bug and it's too big to include here (contact me by
email). Maybe, with that test database and some enable_X fumbling it could
be reproduced, but I did not manage.

In any case, the plans themselves show the bug. To me, it's clear that
filter cannot be moved the way the planner does. If the planner wants to
reduce the size of the right side of the join, to speed up things, it has to
add the filter both at the right side *and* the join node. I don't think how
this could help, though, since the right side is a query by PK.

Row counts:

ag 702k
act 2900
adv 75
camp 579

Schema (approx):

create table advs (
id int ,
name varchar ,
PRIMARY KEY (id)
) with (oids = false);

create table camps (
id int ,
name varchar ,
PRIMARY KEY (id)
) with (oids = false);

create table sources (
id int ,
company varchar ,
PRIMARY KEY (id)
) with (oids = false);

create table countries (
id int ,
name varchar ,
PRIMARY KEY (id)
) with (oids = false);

create table oses (
id int ,
name varchar ,
PRIMARY KEY (id)
) with (oids = false);

create table acts (
id int ,
name varchar ,
source_id int REFERENCES sources(id),
camp_id int REFERENCES camps(id),
os_id int REFERENCE oses(id),
country_id int REFERENCES countries(id),
PRIMARY KEY (id)
) with (oids = false);

CREATE TABLE ag (
id bigserial not null PRIMARY KEY,
created timestamp without time zone, -- daily precision at least
adv_id integer REFERENCES advs(id),
act_id integer REFERENCES acts(id),
source_id integer REFERENCES sources(id), -- take from act, if null

evs integer not null default 0
) WITH ( OIDS = FALSE );

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Rene Grün 2013-11-14 09:32:59 BUG #8579: CoreDump of background writer process
Previous Message Jeffrey Walton 2013-11-13 15:52:28 fork_process.c and OpenSSL