BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: acamargo(at)gmail(dot)com
Subject: BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results
Date: 2018-09-27 11:59:23
Message-ID: 15408-18cb801b3a5b21a5@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15408
Logged by: Alex Pires de Camargo
Email address: acamargo(at)gmail(dot)com
PostgreSQL version: 10.5
Operating system: Ubuntu 5.4.0-6ubuntu1~16.04.10
Description:

I'm having an issue with a multicolumn index (integer, geom), where geom are
PostGIS points, and the integer is a point aggregator with business meaning.
I do searches by boxes in the geom column, joining the integer with other
tables. When the box covers an empty region, the planner does an index scan
looking only the secondary column, very inefficiently. I've spent some time
trying to create a scenario with a reproduction of the problem (with less
data), and I was able to do something that sounds the same issue, only with
integers. This setup will need about 2GB of disk.

/* 0<=n1<=10000 */
/* 0<=n2<=1000 || 3000<=n2<=4000 */
/* s1, r1, r2 to increase I/O needs and avoid index only scans */
create table test as select n as n1, (1000*random())::int as n2,
generate_series(1,1000) s1 , random() as r1, random() as r2 from
generate_series(1,10000) as n;
insert into test select n as n1, ((1000*random())::int) + 3000 as n2,
generate_series(1,1000) s1 , random() as r1, random() as r2 from
generate_series(1,10000) as n;
create index ind_test on test using gist (n1,n2);
create table ids as select generate_series(1,5) as n1; -- same problem with
just one row on this table
analyze ids;
alter table test alter column n1 set statistics 10000; --excluding poor
stats
alter table test alter column n2 set statistics 10000; --excluding poor
stats
analyze test;
explain analyze select * from test join ids using (n1) where n2 = 1001 and
n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms
explain analyze select * from test join ids using (n1) where n2 = 999; --Q2
Inside n2 range, Index cond n1 AND n2 < 1ms
explain analyze select * from test join ids using (n1) where n2 = 1000;
--Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms
explain analyze select * from test join ids using (n1) where n2 = 1001;
--Q4 Outside n2 range, Index cond n2 > 100 ms
explain analyze select * from test join ids using (n1) where n2 = 1002;
--Q5 Outside n2 range, Index cond n2 > 100 ms

Below the result of the explains above:

➤ psql://postgres(at)[local]:5432/postgres

# explain analyze select * from test join ids using (n1) where n2 = 1001
and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..9.51 rows=1 width=28) (actual time=0.067..0.067
rows=0 loops=1)
-> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28)
(actual time=0.067..0.067 rows=0 loops=1)
Index Cond: ((n1 = 1) AND (n2 = 1001))
-> Seq Scan on ids (cost=0.00..1.06 rows=1 width=4) (never executed)
Filter: (n1 = 1)
Planning time: 0.404 ms
Execution time: 0.096 ms
(7 rows)

Time: 0.826 ms

➤ psql://postgres(at)[local]:5432/postgres

# explain analyze select * from test join ids using (n1) where n2 = 999;
--Q2 Inside n2 range, Index cond n1 AND n2 < 1ms
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..43.29 rows=5 width=28) (actual time=0.098..0.367
rows=7 loops=1)
-> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (actual
time=0.012..0.014 rows=5 loops=1)
-> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28)
(actual time=0.064..0.068 rows=1 loops=5)
Index Cond: ((n1 = ids.n1) AND (n2 = 999))
Planning time: 0.994 ms
Execution time: 0.407 ms
(6 rows)

Time: 1.713 ms

➤ psql://postgres(at)[local]:5432/postgres

# explain analyze select * from test join ids using (n1) where n2 =
1000; --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..43.29 rows=3 width=28) (actual time=0.157..0.248
rows=3 loops=1)
-> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (actual
time=0.010..0.011 rows=5 loops=1)
-> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28)
(actual time=0.044..0.046 rows=1 loops=5)
Index Cond: ((n1 = ids.n1) AND (n2 = 1000))
Planning time: 0.877 ms
Execution time: 0.277 ms
(6 rows)

Time: 1.440 ms

➤ psql://postgres(at)[local]:5432/postgres

# explain analyze select * from test join ids using (n1) where n2 =
1001; --Q4 Outside n2 range, Index cond n2 > 100 ms
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..9.55 rows=1 width=28) (actual time=93.247..93.247
rows=0 loops=1)
Join Filter: (test.n1 = ids.n1)
-> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28)
(actual time=93.246..93.246 rows=0 loops=1)
Index Cond: (n2 = 1001)
-> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (never executed)
Planning time: 0.716 ms
Execution time: 93.280 ms
(7 rows)

Time: 94.242 ms

➤ psql://postgres(at)[local]:5432/postgres

# explain analyze select * from test join ids using (n1) where n2 =
1002; --Q5 Outside n2 range, Index cond n2 > 100 ms
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..9.55 rows=1 width=28) (actual time=86.857..86.857
rows=0 loops=1)
Join Filter: (test.n1 = ids.n1)
-> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28)
(actual time=86.856..86.856 rows=0 loops=1)
Index Cond: (n2 = 1002)
-> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (never executed)
Planning time: 0.750 ms
Execution time: 86.885 ms
(7 rows)

Time: 87.955 ms

# select version();
version

---------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)

Time: 2.372 ms
As seen in the explain results, when I search for non-existent values of n2,
the plan changes on Index condition used, giving poor plans. None of this
plan differences appear if the index is btree. It seems something gist
related, and I need to use gist due to PostGIS functions.

A workaround I'm using is creating an additional index for the problematic
queries, but they are causing I/O and storage overhead, due to the huge
amount of data. I've tried two single column indexes, but the performance
penalty was too high compared to the multi-column.

Before opening this bug I've posted this on dba.stackexchange to get some
thoughts, no comments until now:
https://dba.stackexchange.com/questions/218519/postgresql-bad-planning-with-multicolumn-gist-and-search-with-empty-results

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dean Rasheed 2018-09-27 13:22:39 Re: BUG #15307: Low numerical precision of (Co-) Variance
Previous Message Michael Paquier 2018-09-27 09:09:38 Re: BUG #15407: [minor] build depends on $MAKELEVEL being 0 at top Makefile