Index Scan Backward Slow

From: David Osborne <david(at)qcode(dot)co(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: Index Scan Backward Slow
Date: 2015-05-01 10:54:33
Message-ID: CAKmpXCf9eBoP2pW9MUvYgjj+m0wTLiAoGW2LHZkVJp5EOs-Lkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

We have a query which finds the latest row_id for a particular code.

We've found a backwards index scan is much slower than a forward one, to
the extent that disabling indexscan altogether actually improves the query
time.

Can anyone suggest why this might be, and what's best to do to improve the
query time?

dev=> \d table
Table "public.table"
Column | Type | Modifiers
--------------+--------------------------------+-----------
row_id | integer |
code | character(2) |
Indexes:
"table_code_idx" btree (code)
"table_row_idx" btree (row_id)

dev=> select count(*) from table;
count
---------
6090254
(1 row)

dev=> select count(distinct(row_id)) from table;
count
---------
5421022
(1 row)

dev=> select n_distinct from pg_stats where tablename='table' and
attname='row_id';
n_distinct
------------
-0.762951
(1 row)

dev=> show work_mem;
work_mem
-----------
1249105kB
(1 row)

dev=> select version();
version

--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3
20120306 (Red Hat 4.6.3-2), 64-bit
(1 row)

The query in question:

dev=> explain (analyse,buffers) select row_id as last_row_id from table
where code='XX' order by row_id desc limit 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..1.67 rows=1 width=4) (actual time=835.281..835.282
rows=1 loops=1)
Buffers: shared hit=187961
-> Index Scan Backward using table_row_idx on table
(cost=0.43..343741.98 rows=278731 width=4) (actual time=835.278..835.278
rows=1 loops=1)
Filter: (code = 'XX'::bpchar)
Rows Removed by Filter: 4050971
Buffers: shared hit=187961
Total runtime: 835.315 ms
(7 rows)

http://explain.depesz.com/s/uGC

So we can see it's doing a backwards index scan. Out of curiosity I tried a
forward scan and it was MUCH quicker:

dev=> explain (analyse,buffers) select row_id as first_row_id from table
where code='XX' order by row_id asc limit 1;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..1.67 rows=1 width=4) (actual time=19.473..19.474 rows=1
loops=1)
Buffers: shared hit=26730
-> Index Scan using table_row_idx on table (cost=0.43..343741.98
rows=278731 width=4) (actual time=19.470..19.470 rows=1 loops=1)
Filter: (code = 'XX'::bpchar)
Rows Removed by Filter: 62786
Buffers: shared hit=26730
Total runtime: 19.509 ms
(7 rows)

http://explain.depesz.com/s/ASxD

I thought adding a index on row_id desc might be the answer but it has
little effect:

dev=> create index row_id_desc_idx on table(row_id desc);
CREATE INDEX
Time: 5293.812 ms

dev=> explain (analyse,buffers) select row_id as last_row_id from table
where code='XX' order by row_id desc limit 1;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..1.66 rows=1 width=4) (actual time=944.666..944.667
rows=1 loops=1)
Buffers: shared hit=176711 read=11071
-> Index Scan using row_id_desc_idx on table (cost=0.43..342101.98
rows=278731 width=4) (actual time=944.663..944.663 rows=1 loops=1)
Filter: (code = 'XX'::bpchar)
Rows Removed by Filter: 4050971
Buffers: shared hit=176711 read=11071
Total runtime: 944.699 ms
(7 rows)

http://explain.depesz.com/s/JStM

In fact, disabling the index scan completely improves matters considerably:

dev=> drop index row_id_desc_idx;
DROP INDEX
dev=> set enable_indexscan to off;
SET

dev=> explain (analyse,buffers) select row_id as last_row_id from table
where code='XX' order by row_id desc limit 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=74006.39..74006.39 rows=1 width=4) (actual
time=183.997..183.998 rows=1 loops=1)
Buffers: shared hit=14723
-> Sort (cost=74006.39..74703.22 rows=278731 width=4) (actual
time=183.995..183.995 rows=1 loops=1)
Sort Key: row_id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=14723
-> Bitmap Heap Scan on table (cost=5276.60..72612.74 rows=278731
width=4) (actual time=25.533..119.320 rows=275909 loops=1)
Recheck Cond: (code = 'XX'::bpchar)
Buffers: shared hit=14723
-> Bitmap Index Scan on table_code_idx (cost=0.00..5206.91
rows=278731 width=0) (actual time=23.298..23.298 rows=275909 loops=1)
Index Cond: (code = 'XX'::bpchar)
Buffers: shared hit=765
Total runtime: 184.043 ms
(13 rows)

http://explain.depesz.com/s/E9VE

Thanks in advance for any help.

Regards,
--
David Osborne
Qcode Software Limited
http://www.qcode.co.uk

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Evgeniy Shishkin 2015-05-01 10:59:02 Re: Index Scan Backward Slow
Previous Message Tomas Vondra 2015-05-01 01:24:14 Re: PATCH: adaptive ndistinct estimator v4