PoC: prefetching index leaf pages (for inserts)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: PoC: prefetching index leaf pages (for inserts)
Date: 2023-10-22 14:46:51
Message-ID: 8081aa62-4032-927c-8411-548e1b87ed5b@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Some time ago I started a thread about prefetching heap pages during
index scans [1]. That however only helps when reading rows, not when
inserting them.

Imagine inserting a row into a table with many indexes - we insert the
row into the table, and then into the indexes one by one, synchronously.
We walk the index, determine the appropriate leaf page, read it from
disk into memory, insert the index tuple, and then do the same thing for
the next index.

If there are many indexes, and there's not much correlation with the
table, this may easily results in I/O happening synchronously with queue
depth 1. Hard to make it even slower ...

This can be a problem even with a modest number of indexes - imagine
bulk-loading data into a table using COPY (say, in 100-row batches).
Inserting the rows into heap happens in a bulk, but the indexes are
still modified in a loop, as if for single-row inserts. Not great.

The with multiple connections the concurrent I/O may be generated that
way, but for low-concurrency workloads (e.g. batch jobs) that may not
really work.

I had an idea what we might do about this - we can walk the index,
almost as if we're inserting the index tuple, but only the "inner"
non-leaf pages. And instead of descending to the leaf page, we just
prefetch it. The non-leaf pages are typically <1% of the index, and hot,
so likely already cached, so not worth prefetching those.

The attached patch does a PoC of this. It adds a new AM function
"amprefetch", with an implementation for btree indexes, mimicking the
index lookup, except that it only prefetches the leaf page as explained
a bit earlier.

In the executor, this is wrapped in ExecInsertPrefetchIndexes() which
gets called in various places right before ExecInsertPrefetchIndexes().
I thought about doing that in ExecInsertPrefetchIndexes() directly, but
that would not work for COPY, where we want to issue the prefetches for
the whole batch, not for individual tuples.

This may need various improvements - the prefetch duplicates a couple
steps that could be expensive (e.g. evaluation of index predicates,
forming index tuples, and so on). Would be nice to improve this, but
good enough for PoC I think.

Another gap is lack of incremental prefetch (ramp-up). We just prefetch
all the indexes, for all tuples. But I think that's OK. We know we'll
need those pages, and the number is fairly limited.

There's a GUC enable_insert_prefetch, that can be used to enable this
insert prefetching.

I did a simple test on two machines - one with SATA SSD RAID, one with
NVMe SSD. In both cases the data (table+indexes) are an order of
magnitude larger than RAM. The indexes are on UUID, so pretty random and
there's no correlation. Then batches of 100, 1000 and 10000 rows are
inserted, with/without the prefetching.

With 5 indexes, the results look like this:

SATA SSD RAID
-------------

rows no prefetch prefetch
100 176.872 ms 70.910 ms
1000 1035.056 ms 590.495 ms
10000 8494.836 ms 3216.206 ms

NVMe
----

rows no prefetch prefetch
100 133.365 ms 72.899 ms
1000 1572.379 ms 829.298 ms
10000 11889.143 ms 3621.981 ms

Not bad, I guess. Cutting the time to ~30% is nice.

The fewer the indexes, the smaller the difference (with 1 index there is
almost no difference), of course.

regards

[1] https://commitfest.postgresql.org/45/4351/

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
index-insert-prefetch-20231022.patch text/x-patch 22.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-10-22 16:04:01 Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Previous Message Vik Fearing 2023-10-22 11:56:15 Re: Why is hot_standby_feedback off by default?