Re: [PoC] Reducing planning time on tables with many indexes

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Reducing planning time on tables with many indexes
Date: 2022-07-27 12:46:24
Message-ID: CAPsAnr=RWdG6UPtJm99QHzSSpwG5T6fkaftVqjESy1_2q2+Omg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, by accident I sent this one out twice.

--
David Geier
(ServiceNow)

On Wed, Jul 27, 2022 at 2:42 PM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:

> Hi hackers,
>
> We came across a slowdown in planning, where queries use tables with many
> indexes. In setups with wide tables it is not uncommon to have easily
> 10-100 indexes on a single table. The slowdown is already visible in serial
> workloads with just a handful of indexes, but gets drastically amplified
> when running queries with more indexes in parallel at high throughput.
>
> We measured the TPS and planning time of running parallel streams of
> simple point look-up queries on a single empty table with 60 columns and 60
> indexes. The query used is 'SELECT * FROM synth_table WHERE col5 = 42'. No
> rows are returned because the table is empty. We used a machine with 64
> physical CPU cores. The schema and sysbench script to reproduce these
> numbers are attached. We used the TPS as reported by sysbench and obtained
> planning time by running 'EXPLAIN ANALYZE' on the same query in a
> separately opened connection. We averaged the planning time of 3 successive
> 'EXPLAIN ANALYZE' runs. sysbench ran on the same machine with varying
> numbers of threads using the following command line:
>
> sysbench repro.lua --db-driver=pgsql --pgsql-host=localhost
> --pgsql-db=postgres --pgsql-port=? --pgsql-user=? --pgsql-password=?
> --report-interval=1 --threads=64 run
>
> The following table shows the results. It is clearly visible that the TPS
> flatten out already at 8 parallel streams, while the planning time is
> increasing drastically.
>
> Parallel streams | TPS (before) | Planning time (before)
> -----------------|--------------|-----------------------
> 1 | 5,486 | 0.13 ms
> 2 | 8,098 | 0.22 ms
> 4 | 15,013 | 0.19 ms
> 8 | 27,107 | 0.29 ms
> 16 | 30,938 | 0.43 ms
> 32 | 26,330 | 1.68 ms
> 64 | 24,314 | 2.48 ms
>
> We tracked down the root cause of this slowdown to lock contention in
> 'get_relation_info()'. The index lock of every single index of every single
> table used in that query is acquired. We attempted a fix by pre-filtering
> out all indexes that anyways cannot be used with a certain query, without
> taking the index locks (credits to Luc Vlaming for idea and
> implementation). The patch does so by caching the columns present in every
> index, inside 'struct Relation', similarly to 'rd_indexlist'. Then, before
> opening (= locking) the indexes in 'get_relation_info()', we check if the
> index can actually contribute to the query and if not it is discarded right
> away. Caching the index info saves considerable work for every query run
> subsequently, because less indexes must be inspected and thereby locked.
> This way we also save cycles in any code that later on goes over all
> relation indexes.
>
> The work-in-progress version of the patch is attached. It is still fairly
> rough (e.g. uses a global variable, selects the best index in scans without
> restrictions by column count instead of physical column size, is missing
> some renaming, etc.), but shows the principle.
>
> The following table shows the TPS, planning time and speed-ups after
> applying the patch and rerunning above described benchmark. Now, the
> planning time remains roughly constant and TPS roughly doubles each time
> the number of parallel streams is doubled. The higher the stream count the
> more severe the lock contention is and the more pronounced the gained
> speed-up gets. Interestingly, even for a single query stream the speed-up
> in planning time is already very significant. This applies also for lower
> index counts. For example just with 10 indexes the TPS for a single query
> stream goes from 9,159 to 12,558. We can do more measurements if there is
> interest in details for a lower number of indexes.
>
> Parallel streams | TPS (after) | Planning time (after) | Speed-up TPS |
> Speed-up planning
>
> -----------------|-------------|-----------------------|--------------|------------------
> 1 | 10,344 | 0.046 | 1.9x |
> 2.8x
> 2 | 20,140 | 0.045 ms | 2.5x |
> 4.9x
> 4 | 40,349 | 0.047 ms | 2.7x |
> 4.0x
> 8 | 80,121 | 0.046 ms | 3.0x |
> 6.3x
> 16 | 152,632 | 0.051 ms | 4.9x |
> 8.4x
> 32 | 301,359 | 0.052 ms | 11.4x |
> 32.3x
> 64 | 525,115 | 0.062 ms | 21.6x |
> 40.0x
>
> We are happy to receive your feedback and polish up the patch.
>
> --
> David Geier
> (ServiceNow)
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-07-27 12:50:23 Re: Skip partition tuple routing with constant partition key
Previous Message David Geier 2022-07-27 12:42:37 [PoC] Reducing planning time on tables with many indexes