Inefficient filter order in query plan

From: Tom Coogan <nocera(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Inefficient filter order in query plan
Date: 2014-02-27 16:57:42
Message-ID: CAN07wvLhJhWApx2uGiyeOeCMtXjLw8W84vNNxrhcNvLxKjhTdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'd like to understand why PostgreSQL is choosing to filter on the most
inefficient predicate first in the query below.

Version: PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc
(SUSE Linux) 4.7.1 20120723 [gcc-4_7-branch revision 189773], 64-bit
Hardware: 24 2.93GHz Xeon cores and 32GB RAM

Table in question:
------------------------------------------------------------
CREATE TABLE audit_trail (
id int PRIMARY KEY NOT NULL,
model varchar(128) NOT NULL,
model_id int NOT NULL,
created bool,
updated bool,
deleted bool,
change_set text,
created_by varchar(64) NOT NULL,
created_date timestamp NOT NULL,
updated_by varchar(64) NOT NULL,
updated_date timestamp NOT NULL
);
CREATE UNIQUE INDEX audit_trail_pkey ON audit_trail(id);

Row count: 5,306,596
Min / Avg / Max character length of change_set column values: 11 / 165 /
12717859
Number of unique values in model column: 196

Query and plan:
------------------------------------------------------------
SET track_io_timing = on;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM audit_trail WHERE model = 'User' AND model_id = 304 AND
change_set ILIKE '%test%';

Seq Scan on audit_trail (cost=0.00..243427.98 rows=1 width=189) (actual
time=15509.722..17943.896 rows=6 loops=1)
Filter: ((change_set ~~* '%test%'::text) AND ((model)::text =
'User'::text) AND (model_id = 304))
Rows Removed by Filter: 5306590
Buffers: shared hit=10410 read=164384
I/O Timings: read=310.189
Total runtime: 17943.930 ms

Observations:
------------------------------------------------------------
1) Without the change_set predicate, the query runs in 1 second and returns
461 rows.
2) Turning statistics off for the change_set column has no effect on the
plan or execution time:
ALTER TABLE audit_trail ALTER COLUMN change_set SET STATISTICS 0;
ANALYZE audit_trail(change_set);
3) Setting statistics to the max value has no effect on the plan or
execution time:
ALTER TABLE audit_trail ALTER COLUMN change_set SET STATISTICS 10000;
ANALYZE audit_trail(change_set);
4) Adding an index on (model, model_id) changes the plan so that it starts
with an index scan. Query time < 1s.
CREATE INDEX audit_trail_model_idx ON audit_trail (model, model_id);

Aggregate (cost=12.29..12.30 rows=1 width=4) (actual
time=1.455..1.456 rows=1 loops=1)
-> Index Scan using audit_trail_model_idx on audit_trail this_
(cost=0.56..12.29 rows=1 width=4) (actual time=1.446..1.446 rows=0 loops=1)
Index Cond: (((model)::text = 'User'::text) AND (model_id =
304))
Filter: (change_set ~~* '%test%'::text)
Rows Removed by Filter: 461

Although adding the index will fix the performance problem, I'd like to
understand why, in absence of the index, PostgreSQL would choose to filter
on the change_set value first. Since it is not specified first in the
predicate and its column ordinality is higher than model and model_id, the
plan generator must be choosing it first for some particular reason.

Any insight is appreciated.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-02-27 17:04:17 Re: Inefficient filter order in query plan
Previous Message Jean-Max Reymond 2014-02-27 14:46:29 Re: not using my GIN index in JOIN expression