Quick Links

Large offset optimization and index only scan

From:	Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Large offset optimization and index only scan
Date:	2013-11-19 07:03:26
Message-ID:	CAK-MWwQpZobHfuTtHj9+9G+5=ck+aX-ANWHtBK_0_D_qHYxWuw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Recently, I found very interesting case where using IOS could provide
serious benefits, but database not able to use it.

I think my ideas could be beneficial in two ways:
1)as work around of the current limitations
2)as idea to improve planner/executor in the future

Case:
SELECT * FROM very_large_table ORDER BY [some indexed column(s)] OFFSET
[some large value] LIMIT N;

Idea:
Use IOS (index only scan) to skip the first OFFSET values than switch to
common index scan to fetch LIMIT N values

Usage sample (from real life).. fully cached case:

Simple index-scan:
(postgres(at)[local]:5432) [hh_data]=# explain analyze select * from
vacancy_invitation order by vacancy_invitation_id limit 10 offset 1000000;

Index only scan... 3x speedup:
(postgres(at)[local]:5432) [hh_data]=# explain analyze select
vacancy_invitation_id from vacancy_invitation order by
vacancy_invitation_id limit 10 offset 1000000;

Now lets combine fast IOS for skip first 1M records and simple index scan
to fetch the next 10 full entries... still 3x speedup over initial version:
(postgres(at)[local]:5432) [hh_data]=# explain analyze select * from
vacancy_invitation where vacancy_invitation_id>(select
vacancy_invitation_id from vacancy_invitation order by
vacancy_invitation_id limit 1 offset 1000000) order by
vacancy_invitation_id limit 10;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5546.49..5546.58 rows=10 width=953) (actual
time=289.674..289.683 rows=10 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=5546.48..5546.49 rows=1 width=4) (actual
time=289.654..289.654 rows=1 loops=1)
-> Index Only Scan using vacancy_invitation_pkey_dblarge on
vacancy_invitation (cost=0.00..227788.13 rows=41068952 width=4) (actual
time=0.013..252.358 rows=1000001 loops=1)
Heap Fetches: 23
-> Index Scan using vacancy_invitation_pkey_dblarge on
vacancy_invitation (cost=0.00..123476.85 rows=13689651 width=953) (actual
time=289.673..289.681 rows=10 loops=1)
Index Cond: (vacancy_invitation_id > $0)
Total runtime: 289.716 ms

As a result there is possible to have a 3x speedup large offset query if
corresponding to order by index present.
In not fully cached case difference could be easy 100x (if index stayed in
memory and table data not).

May be it could be justified implement this approach (use IOS to skip an
offset records) in planner/executor (instead of manual query rewriting).

I hope this post could be useful for someone.

PS: yes I know that the large offset is big evil... but ability to serious
speedup in some cases is still useful.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

Responses

Re: Large offset optimization and index only scan at 2013-11-19 17:18:19 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2013-11-19 07:16:05	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Previous Message	KONDO Mitsumasa	2013-11-19 06:54:03	Re: Logging WAL when updating hintbit