Why is this query touching 4gb of buffers?

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Why is this query touching 4gb of buffers?
Date: 2025-10-24 11:01:48
Message-ID: aPtcnI2BZkeHjRvF@depesz.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,
I have weird-ish case, that I can't grok, or at least explain in
hand-wavy way.

Very simple query:

SELECT
some_table.communication_channel_id,
some_table.root_account_id
FROM
some_schema.some_table
WHERE
workflow_state = 'pending' AND
send_at <= '2025-10-23 12:01:13';

On complex-ish table:

Table "some_schema.some_table"
Column │ Type │ Collation │ Nullable │ Default
═════════════════════════════════╪═════════════════════════════╪═══════════╪══════════╪═══════════════════════════════════════════════════════════════════
id │ bigint │ │ not null │ nextval('some_schema.some_table_id_seq'::regclass)
notification_id │ bigint │ │ │
notification_policy_id │ bigint │ │ │
context_id │ bigint │ │ │
context_type │ character varying(255) │ │ │
communication_channel_id │ bigint │ │ │
frequency │ character varying(255) │ │ │
workflow_state │ character varying(255) │ │ │
batched_at │ timestamp without time zone │ │ │
created_at │ timestamp without time zone │ │ │
updated_at │ timestamp without time zone │ │ │
send_at │ timestamp without time zone │ │ │
link │ text │ │ │
name_of_topic │ text │ │ │
summary │ text │ │ │
root_account_id │ bigint │ │ │
notification_policy_override_id │ bigint │ │ │
Indexes:
"some_table_pkey" PRIMARY KEY, btree (id), tablespace "data1"

"index_some_table_pending" btree (send_at) WHERE workflow_state::text = 'pending'::text, tablespace "data1"

Running this via explain shows:

Index Scan using index_some_table_pending on some_table (cost=0.43..399992.44 rows=2215063 width=16) (actual time=382.466..382.466 rows=0 loops=1)
Index Cond: (send_at <= '2025-10-23 12:01:13'::timestamp without time zone)
Buffers: shared hit=554347
Planning Time: 0.082 ms
Execution Time: 382.482 ms
(5 rows)

Specifically, I see that it returned 0 rows, and to do it, it had to "touch" 554k buffers - ~ 4GB of data.

Sizes of the relavant things:
oid │ relkind │ pg_relation_size │ pg_table_size │ pg_total_relation_size
══════════════════════════════════════╪═════════╪══════════════════╪═══════════════╪════════════════════════
some_schema.some_table │ r │ 15339020288 │ 15376539648 │ 22347988992
some_schema.index_some_table_pending │ i │ 77127680 │ 77168640 │ 77168640
(2 rows)

I didn't check analyze time, so can't comment on this, but I did reindex of
this index, which shrunk it to 16973824 bytes, and now the query runs, as expected, in < 1ms:

Index Scan using index_some_table_pending on some_table (cost=0.43..392423.37 rows=2215272 width=16) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: (send_at <= '2025-10-23 12:01:13'::timestamp without time zone)
Buffers: shared hit=3
Planning:
Buffers: shared hit=3
Planning Time: 0.787 ms
Execution Time: 0.016 ms
(7 rows)

While talking with others, there have been used certain terms, like
"unbounded range", which I understand, but I still don't see why, with
the same stats, and NO rows returned by index - before index, pg has to
do stuff to 4GB of data?! Anyone could try to explain?

If it matters, this is rather old Pg: PostgreSQL 14.7, and the query was
running on streaming replica, but results on primary were basically the
same.

Best regards,

depesz

Responses

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2025-10-24 11:19:03 Re: Why is this query touching 4gb of buffers?
Previous Message Gregory Smith 2025-10-24 05:34:43 Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication)