BUG #16504: Wrapping query in EXISTS() causes sequential scans of tables

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: postgresql(dot)20(dot)drkshadow(at)spamgourmet(dot)com
Subject: BUG #16504: Wrapping query in EXISTS() causes sequential scans of tables
Date: 2020-06-19 03:31:29
Message-ID: 16504-d8c5d6beb38e8b6e@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16504
Logged by: Drk Shadow
Email address: postgresql(dot)20(dot)drkshadow(at)spamgourmet(dot)com
PostgreSQL version: 12.0
Operating system: Gentoo
Description:

I'm trying to migrate from Postgresql 9.6, but running into performance
issues.

In this case, I have a query that selects from a table, joins a table, joins
that same table, joins the first table, performs a NOT
firsttable.value=lasttable.value to make sure that I'm not joining an item
with itself. There are indexes for the conditions in each join. When I run
this as a select query, the EXPLAIN output uses index scans the whole way
through. When I SELECT EXISTS(thatquery) FROM firsttable WHERE firsttable.id
IN (1,2,3), the EXPLAIN output switches to using sequential scans. This
destroys performance on this 3.8m row table.

The real case, with a view and multiple queries joined together, trying to
determine if one item (or its preview) has a tag anywhere:
https://www.db-fiddle.com/f/efSF4GbPEbAAUQWRnRCkEp/5

The example case, where I compare an EXISTS() to a flat select:
https://www.db-fiddle.com/f/f96LRpJQF3VFGZo9EP8nMm/0

=======
The SQL for the example, in case db-fiddle goes away

CREATE TABLE boxes (
itemid INTEGER PRIMARY KEY,
size INTEGER DEFAULT 0,
md5 CHARACTER(22),
prevoid oid DEFAULT 0,
prevlocal BOOLEAN DEFAULT FALSE
);
CREATE INDEX prevoid ON boxes (prevoid) WHERE prevoid > 0;
CREATE INDEX boxhash ON boxes(size, SUBSTR(md5, 1, 3));

-- your preview is stored in another location..
CREATE TABLE prevs (
id INTEGER PRIMARY KEY,
size INTEGER NOT NULL,
md5 CHARACTER(22) NOT NULL
-- location CHARACTER...
);
CREATE INDEX prevhash ON prevs(size, SUBSTR(md5, 1, 3));

-- ratings
CREATE TABLE box_tags (
size INTEGER NOT NULL,
md5 CHARACTER(22) NOT NULL,
value INTEGER NOT NULL
);
CREATE INDEX taghashes ON box_tags(size, SUBSTR(md5, 1, 3));

CREATE VIEW other_location AS
SELECT bx_a.itemid, bx_a.prevlocal AS local, bx_other_a.itemid AS bitemid,
bt_a.value
from boxes bx_a
JOIN prevs pv_a ON pv_a.id=(bx_a.prevoid)::INTEGER AND bx_a.prevlocal
JOIN prevs pv1_a ON (pv1_a.size,SUBSTR(pv1_a.md5, 1,
3),pv1_a.md5)=(pv_a.size, SUBSTR(pv_a.md5, 1, 3),pv_a.md5)
JOIN boxes bx_other_a ON bx_other_a.prevoid=(pv1_a.id)::oid AND
bx_other_a.prevlocal
LEFT JOIN box_tags bt_a ON (bt_a.size,SUBSTR(bt_a.md5, 1, 3),
bt_a.md5)=(pv1_a.size,SUBSTR(pv1_a.md5, 1, 3), pv1_a.md5)
WHERE bx_other_a.prevoid > 0 AND NOT bx_a.itemid = bx_other_a.itemid
;

-- ----
-- The examples

EXPLAIN
SELECT EXISTS(select 1 FROM other_location WHERE itemid=boxes.itemid)
FROM boxes
WHERE boxes.itemid IN (1,2,3);

----
Not copy-pasting the EXPLAIN output, the first query has two subplans. The
first subplan uses indexes, and the second subplan uses -> Seq Scan on
boxes bx_a_1 (cost=0.00..22.10 rows=605 width=8).

The second query has no second subplan, and there is no sequential table
scan.

It appears there is a bug: EXISTS() subqueries are not handled as part of
the query, or perhaps they're done in parallel (at extreme cost), or perhaps
something else. In this case, there's no indication of a parallel branch,
and `SET max_parallel_workers_per_gather = 0;` does not change the plan.
I've noticed this problem as well with JOIN (SELECT...) subqueries (but I
didn't make an example), so I believe this isn't about EXISTS(), but rather
about _all_ subqueries now being a performance barrier.

As such, I can't migrate anything with subqueries to Postgres 12. Is this a
bug?

EXPLAIN
SELECT other_location.bitemid
FROM boxes
JOIN other_location ON other_location.itemid=boxes.itemid
WHERE boxes.itemid IN (1,2,3);

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ram Pratap Maurya 2020-06-19 05:20:21 RE: BUG #16497: old and new pg_controldata WAL segment sizes are invalid or do not match
Previous Message Michael Paquier 2020-06-19 00:48:55 Re: BUG #16497: old and new pg_controldata WAL segment sizes are invalid or do not match