Query planner not using indexes with JOIN query and OR clause

From: Ben Hoyt <benhoyt(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query planner not using indexes with JOIN query and OR clause
Date: 2015-07-13 20:54:29
Message-ID: CAL9jXCEAumhJf6FJUhZnE8vc0r1-EQOMYvPXe9-0-S3xT+zVnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi folks,

I have a fairly simple three-table query (pasted below) with two LEFT JOINs
and an OR in the WHERE clause that for some reason is doing sequential
scans on all three tables (two of them large -- several million rows), even
though I have indexes on the relevant "filename" columns.

Note the two parts of the where clause -- a filter on the image2 table and
a filter on the panoramas table. If I comment out either filter and just
filter on i.filename by itself, or p.filename by itself, the query planner
uses the relevant index and the query takes a few milliseconds. But when I
have both clauses (as shown below) it falls back to sequential scanning all
three tables for some reason, taking several seconds.

What am I missing? There must be some reason PostgreSQL can't use the index
in this case, but I can't see what it is. If I were PostgreSQL I'd be using
the index on i.filename and p.filename to filter to a couple of rows first,
then join, making it super-quick.

In this test I'm running PostgreSQL 9.3.3 on Windows 64-bit, but the same
thing happens on our Linux-based database. We're using autovacuum on both
databases, and I've tried manually VACUUM ANALYZE-ing all three tables just
in case, but that doesn't help. Memory config values are set to sensible
values. Note: this is a new query, so in terms of "history", it's always
been slow.

My query and PostgreSQL version and the explain and a lot of other table
data is pasted below.

QUERY
----------
select ai.position, i.filename as image_filename, p.filename as
panorama_filename
from album_items ai
left join image2 i on i.imageid = ai.image_id
left join panoramas p on p.id = ai.panorama_id
where i.filename in ('pano360--v471.jpg', 'pano360-2--v474.jpg') or
p.filename in ('pano360--v471', 'pano360-2--v474')
----------

POSTGRESQL VERSION
----------
PostgreSQL 9.3.3, compiled by Visual C++ build 1600, 64-bit
----------

EXPLAIN (ANALYZE, BUFFERS)
----------
http://explain.depesz.com/s/qcpQ

Hash Left Join (cost=344184.62..963863.99 rows=376 width=57) (actual
time=3157.104..8838.329 rows=2 loops=1)
Hash Cond: (ai.panorama_id = p.id)
Filter: ((i.filename = ANY
('{pano360--v471.jpg,pano360-2--v474.jpg}'::text[])) OR (p.filename = ANY
('{pano360--v471,pano360-2--v474}'::text[])))
Rows Removed by Filter: 7347790
Buffers: shared hit=8967 read=198827, temp read=76936 written=75908
I/O Timings: read=609.403
-> Hash Left Join (cost=341001.56..781324.85 rows=7346959 width=39)
(actual time=2660.821..7842.202 rows=7347792 loops=1)
Hash Cond: (ai.image_id = i.imageid)
Buffers: shared hit=6936 read=198827, temp read=76662 written=75640
I/O Timings: read=609.403
-> Seq Scan on album_items ai (cost=0.00..156576.59 rows=7346959
width=12) (actual time=0.009..981.074 rows=7347792 loops=1)
Buffers: shared hit=4297 read=78810
I/O Timings: read=251.402
-> Hash (cost=194687.36..194687.36 rows=7203136 width=35) (actual
time=2658.643..2658.643 rows=7200287 loops=1)
Buckets: 2048 Batches: 512 Memory Usage: 976kB
Buffers: shared hit=2639 read=120017, temp written=49961
I/O Timings: read=358.000
-> Seq Scan on image2 i (cost=0.00..194687.36 rows=7203136
width=35) (actual time=0.007..1063.586 rows=7200287 loops=1)
Buffers: shared hit=2639 read=120017
I/O Timings: read=358.000
-> Hash (cost=2423.47..2423.47 rows=39247 width=26) (actual
time=12.100..12.100 rows=39247 loops=1)
Buckets: 2048 Batches: 4 Memory Usage: 575kB
Buffers: shared hit=2031, temp written=170
-> Seq Scan on panoramas p (cost=0.00..2423.47 rows=39247
width=26) (actual time=0.003..5.470 rows=39247 loops=1)
Buffers: shared hit=2031
Total runtime: 8838.701 ms
----------

TABLE METADATA
----------
Number of rows in album_items: 7347792
Number of rows in image2: 7200287
Number of rows in panoramas: 39247
----------

TABLES (AND THEIR INDEXES) REFERENCED IN QUERY
----------
CREATE TABLE content.album_items
(
album_id integer NOT NULL,
image_id integer,
"position" integer,
caption text,
active boolean NOT NULL,
panorama_id integer,
CONSTRAINT album_items_album_id_fkey FOREIGN KEY (album_id) REFERENCES
content.albums (id),
CONSTRAINT album_items_image_id_fkey FOREIGN KEY (image_id) REFERENCES
content.image2 (imageid),
CONSTRAINT album_items_panorama_id_fkey FOREIGN KEY (panorama_id)
REFERENCES content.panoramas (id),
CONSTRAINT album_image_unique UNIQUE (album_id, image_id)
);

CREATE INDEX album_items_album_id_idx ON content.album_items (album_id);
CREATE INDEX album_items_image_id_idx ON content.album_items (image_id);
CREATE INDEX album_items_panorama_id_idx ON content.album_items
(panorama_id);

CREATE TABLE content.image2
(
imageid integer NOT NULL DEFAULT nextval('image2_imageid_seq'::regclass),
hotelid integer,
filename text NOT NULL,
originalfoldername text NOT NULL,
width integer,
height integer,
active boolean NOT NULL DEFAULT false,
importid integer,
timetaken timestamp without time zone,
state integer NOT NULL DEFAULT 1,
has_wide boolean NOT NULL DEFAULT false,
type integer,
document_id integer,
property_id integer,
CONSTRAINT image2_pkey PRIMARY KEY (imageid),
CONSTRAINT fk_image2_hotelid FOREIGN KEY (hotelid) REFERENCES
content.hotel (hotelid),
CONSTRAINT fk_image2_importid FOREIGN KEY (importid) REFERENCES
content.imageimport (importid),
CONSTRAINT image2_document_id_fkey FOREIGN KEY (document_id) REFERENCES
content.documents (id),
CONSTRAINT image2_property_id_fkey FOREIGN KEY (property_id) REFERENCES
content.properties (id),
CONSTRAINT uq_image2_filename UNIQUE (filename)
);

CREATE INDEX fki_image2_property_id_fkey ON content.image2 (property_id);
CREATE INDEX image2_document_id_idx ON content.image2 (document_id);
CREATE INDEX image2_importid_idx ON content.image2 (importid);
CREATE INDEX ix_image2_hotelid ON content.image2 (hotelid);
CREATE INDEX ix_image2_imageid ON content.image2 (imageid);

CREATE TABLE content.panoramas
(
id integer NOT NULL DEFAULT nextval('panoramas_id_seq'::regclass),
hotel_id integer,
filename text NOT NULL,
folder text NOT NULL,
import_id integer NOT NULL,
active boolean NOT NULL DEFAULT false,
state integer NOT NULL,
num_images integer NOT NULL DEFAULT 0,
type integer,
hdr boolean NOT NULL DEFAULT false,
has_preview boolean NOT NULL DEFAULT false,
property_id integer,
data json,
previews_created boolean NOT NULL DEFAULT false,
CONSTRAINT panoramas_pkey PRIMARY KEY (id),
CONSTRAINT fk_panoramas_hotel_id FOREIGN KEY (hotel_id) REFERENCES
content.hotel (hotelid),
CONSTRAINT fk_panoramas_import_id FOREIGN KEY (import_id) REFERENCES
content.imageimport (importid),
CONSTRAINT panoramas_property_id_fkey FOREIGN KEY (property_id)
REFERENCES content.properties (id),
CONSTRAINT panoramas_uq_filename UNIQUE (filename)
);

CREATE INDEX fki_panoramas_property_id_fkey ON content.panoramas
(property_id);
CREATE INDEX panoramas_hotel_id_idx ON content.panoramas (hotel_id);
CREATE INDEX panoramas_import_id_idx ON content.panoramas (import_id);
----------

Thanks in advance,
Ben

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2015-07-13 21:01:35 Re: Query planner not using indexes with JOIN query and OR clause
Previous Message James Cloos 2015-07-11 23:18:21 Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?