Reducing planning time on tables with many indexes

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Reducing planning time on tables with many indexes
Date: 2022-07-27 12:37:57
Message-ID: CAPsAnr=zdqgueKx+eMAg1aG5YcST5T9xuPghZ84aVDSX0Nh5CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

We came across a slowdown in planning, where queries use tables with many
indexes. In setups with wide tables it is not uncommon to have easily
10-100 indexes on a single table. The slowdown is already visible in serial
workloads with just a handful of indexes, but gets drastically amplified
when running queries with more indexes in parallel at high throughput.

We measured the TPS and planning time of running parallel streams of simple
point look-up queries on a single empty table with 60 columns and 60
indexes. The query used is 'SELECT * FROM synth_table WHERE col5 = 42'. No
rows are returned because the table is empty. We used a machine with 64
physical CPU cores. The schema and sysbench script to reproduce these
numbers are attached. We used the TPS as reported by sysbench and obtained
planning time by running 'EXPLAIN ANALYZE' on the same query in a
separately opened connection. We averaged the planning time of 3 successive
'EXPLAIN ANALYZE' runs. sysbench ran on the same machine with varying
numbers of threads using the following command line:

sysbench repro.lua --db-driver=pgsql --pgsql-host=localhost
--pgsql-db=postgres --pgsql-port=? --pgsql-user=? --pgsql-password=?
--report-interval=1 --threads=64 run

The following table shows the results. It is clearly visible that the TPS
flatten out already at 8 parallel streams, while the planning time is
increasing drastically.

Parallel streams | TPS (before) | Planning time (before)
-----------------|--------------|-----------------------
1 | 5,486 | 0.13 ms
2 | 8,098 | 0.22 ms
4 | 15,013 | 0.19 ms
8 | 27,107 | 0.29 ms
16 | 30,938 | 0.43 ms
32 | 26,330 | 1.68 ms
64 | 24,314 | 2.48 ms

We tracked down the root cause of this slowdown to lock contention in
'get_relation_info()'. The index lock of every single index of every single
table used in that query is acquired. We attempted a fix by pre-filtering
out all indexes that anyways cannot be used with a certain query, without
taking the index locks (credits to Luc Vlaming for idea and
implementation). The patch does so by caching the columns present in every
index, inside 'struct Relation', similarly to 'rd_indexlist'. Then, before
opening (= locking) the indexes in 'get_relation_info()', we check if the
index can actually contribute to the query and if not it is discarded right
away. Caching the index info saves considerable work for every query run
subsequently, because less indexes must be inspected and thereby locked.
This way we also save cycles in any code that later on goes over all
relation indexes.

The work-in-progress version of the patch is attached. It is still fairly
rough (e.g. uses a global variable, selects the best index in scans without
restrictions by column count instead of physical column size, is missing
some renaming, etc.), but shows the principle.

The following table shows the TPS, planning time and speed-ups after
applying the patch and rerunning above described benchmark. Now, the
planning time remains roughly constant and TPS roughly doubles each time
the number of parallel streams is doubled. The higher the stream count the
more severe the lock contention is and the more pronounced the gained
speed-up gets. Interestingly, even for a single query stream the speed-up
in planning time is already very significant. This applies also for lower
index counts. For example just with 10 indexes the TPS for a single query
stream goes from 9,159 to 12,558. We can do more measurements if there is
interest in details for a lower number of indexes.

Parallel streams | TPS (after) | Planning time (after) | Speed-up TPS |
Speed-up planning
-----------------|-------------|-----------------------|--------------|------------------
1 | 10,344 | 0.046 | 1.9x |
2.8x
2 | 20,140 | 0.045 ms | 2.5x |
4.9x
4 | 40,349 | 0.047 ms | 2.7x |
4.0x
8 | 80,121 | 0.046 ms | 3.0x |
6.3x
16 | 152,632 | 0.051 ms | 4.9x |
8.4x
32 | 301,359 | 0.052 ms | 11.4x |
32.3x
64 | 525,115 | 0.062 ms | 21.6x |
40.0x

We are happy to receive your feedback and polish up the patch.

--
David Geier
(ServiceNow)

Attachment Content-Type Size
repro.lua text/x-lua 205 bytes
schema.sql application/sql 869 bytes
0001-Index-filtering.patch text/x-patch 21.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2022-07-27 12:41:43 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Dilip Kumar 2022-07-27 12:32:01 Re: making relfilenodes 56 bits