Quick Links

Query performance

From:	"Dirschel, Steve-CW" <Steve(dot)Dirschel(at)bestbuy(dot)com>
To:	"pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Query performance
Date:	2026-04-27 15:52:06
Message-ID:	BYAPR06MB447199DE1FCF363EF389EF1A8A362@BYAPR06MB4471.namprd06.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Aurora Postgres version 17.4.

Table in question:

The table is constantly getting loaded into. Rows are inserted with a status ready and then there is a different process looking for that status and will update to processed after processing the row. We have multiple tables like this.

Then every 2 hours a different process runs this query looking for failed or unprocessed rows:

select
ppse.stage_id as stageId,
ppse.status as status,
ppse.message_body as messageBody
from
poslog_publisher_rms_stage ppse
where
ppse.status in ('UNPROCESSED','FAILED')
and ppse.error_retry_count < 3
order by
ppse.create_date
limit 100;

If I run that query with explain it is doing a ton of work to find 0 rows. The index it uses seems appropriate for the query.

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.823..66566.824 rows=0 loops=1)
Buffers: shared hit=1509768 read=2011479
I/O Timings: shared read=79792.017
-> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=66566.821..66566.821 rows=0 loops=1)
Sort Key: create_date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1509768 read=2011479
I/O Timings: shared read=79792.017
-> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse (cost=0.57..10.91 rows=1 width=1019) (actual time=66566.761..66566.761 rows=0 loops=1)
Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
Buffers: shared hit=1509765 read=2011479
I/O Timings: shared read=79792.017
Planning:
Buffers: shared hit=195 read=1
I/O Timings: shared read=1.038
Planning Time: 2.909 ms
Execution Time: 66581.498 ms

The query did 3.5 million block reads when scanning the index of which 1.5 million were in memory and 2 million were from disk. 5 seconds later I ran the exact same query again:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=23.589..23.591 rows=0 loops=1)
Buffers: shared hit=18736
-> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=23.588..23.589 rows=0 loops=1)
Sort Key: create_date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=18736
-> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse (cost=0.57..10.91 rows=1 width=1019) (actual time=23.583..23.584 rows=0 loops=1)
Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
Buffers: shared hit=18736
Planning Time: 0.118 ms
Execution Time: 23.628 ms

Now it only did 18k block reads all in memory. It used the same index, it also returned 0 rows. Between those 2 runs I looked at pg_stat_user_tables and could see the n_tup_ins increased by 13, n_tup_del increased by 13, n_live_tup increased by 13, and n_dead_tup increased by 13. n_live_tup was 96 million and n_dead_tup was 8.4 million. 18k logical reads to find 0 rows is still high but I believe that is most likely caused by the 8.4 million n_dead_tups.

15 minutes later I ran the query again:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10.92..10.93 rows=1 width=1019) (actual time=59431.795..59431.796 rows=0 loops=1)
Buffers: shared hit=1286676 read=1734000
I/O Timings: shared read=70153.578
-> Sort (cost=10.92..10.93 rows=1 width=1019) (actual time=59431.794..59431.794 rows=0 loops=1)
Sort Key: create_date
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1286676 read=1734000
I/O Timings: shared read=70153.578
-> Index Scan using idx_poslog_publisher_stage_status_error_retry_count_modi_date_c on poslog_publisher_rms_stage ppse (cost=0.57..10.91 rows=1 width=1019) (actual time=59431.789..59431.790 rows=0 loops=1)
Index Cond: (((status)::text = ANY ('{UNPROCESSED,FAILED}'::text[])) AND (error_retry_count < 3))
Buffers: shared hit=1286676 read=1734000
I/O Timings: shared read=70153.578
Planning Time: 0.114 ms
Execution Time: 59431.839 ms

Total blocks reads increased from 18k to 3 million. n_tup_ins, n_tup_del, n_live_tup, and n_dead_tup all increased by 10,320 over that 15 minute period of time.

What is going on here where this query has to do 3+ million block reads to find 0 rows? And how is it possible when I run the query 2 times in a row the logical reads from the 2nd run comes down significantly? Is this somehow related to determining if rows are visible or something like that? When I waited 15 minutes between runs the inserted/updated rows only increased by 10.3k yet total block reads increased by 3 million.

Thanks

In response to

Re: how to switch user in postgres at 2026-04-11 18:33:58 from David G. Johnston

Responses

Re: Query performance at 2026-04-28 05:33:24 from Laurenz Albe

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Laurenz Albe	2026-04-28 05:33:24	Re: Query performance
Previous Message	Merlin Moncure	2026-04-21 17:22:45	feature request: index supported REINDEX for partial indexes on needle/haystack tables