partitioning performance tests after recent patches

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: partitioning performance tests after recent patches
Date: 2019-04-14 19:19:41
Message-ID: 1555269581680.1050@Optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

After all the great work that was done on partitioning over the past months, I wanted to take a closer look at the performance. I prepared some performance tests for use cases that I often encounter on partitioned tables. My goal was to test the performance and stability of the recent changes that were implemented regarding query planning and run-time pruning between native partitioning on HEAD, PG11 and TimescaleDb partitioning on PG11. I'm posting the results here now. I'm sure many of these results are not new to most people here, but maybe there's some useful information in there. :-) Furthermore, I have a few questions about the results that I obtained.

Several test cases were prepared. These test cases were run for cases with 16, 64, 256, 1024 and 4096 partitions using pgbench functionality for 60 seconds each. The exact commands that were run can be found in the attached output file. Note that most of the SELECT queries that are benchmarked are basically testing planning time, as this is generally the part that becomes much slower when more partitions are added. The queries are written such that only one or two partitions are interesting for that particular query and the rest should be discarded as early as possible. Execution time is very small for these queries as they just do a simple index scan on the remaining chunk.

The test cases were (see benchmark.sql for the SQL commands for setup and test cases):
1. Insert batches of 1000 rows per transaction
2. Simple SELECT query pruning on a static timestamp
3. The same SELECT query with static timestamp but with an added 'ORDER BY a, updated_at DESC LIMIT 1', which matches the index defined on the table
4. The same simple SELECT query, but now it's wrapped inside an inlineable SQL function, called with a static timestamp
5. The same simple SELECT query, but now it's wrapped inside a non-inlineable SQL function, called with a static timestamp
6. The same simple SELECT query, but now it's wrapped inside a plpgsql function, called with a static timestamp
7. Simple SELECT query pruning on a timestamp now()
8. The same SELECT query with dynamic timestamp but with an added 'ORDER BY a, updated_at DESC LIMIT 1', which matches the index defined on the table
9. The same simple SELECT query, but now it's wrapped inside an inlineable SQL function, called with a dynamic timestamp
10. The same simple SELECT query, but now it's wrapped inside a non-inlineable SQL function, called with a dynamic timestamp
11. The same simple SELECT query, but now it's wrapped inside a plpgsql function, called with a dynamic timestamp
12. The same query as 2) but then in an inlineable function
13. The same query as 3) but then in an inlineable function
14. A SELECT with nested loop (10 iterations) with opportunities for run-time pruning - some rows from a table are selected and the timestamp from rows in that table is used to join on another partitioned table

The full results can be found in the attached file (results.txt). I also produced graphs of the results, which can be found on TimescaleDb's Github page [1]. Please take a look at these figures for an easy overview of the results. In general performance of HEAD looks really good.

While looking at these results, there were a few questions that I couldn't answer.
1) It seems like the queries inside plpgsql functions (case 6 and 11) perform relatively well in PG11 compared to a non-inlineable SQL function (case 5 and 10), when a table consists of many partitions. As far as I know, both plpgsql and non-inlineable SQL functions are executed with generic plans. What can explain this difference? Are non-inlineable SQL function plans not reused between transactions, while plpgsql plans are?
2) Is running non-inlined SQL functions with a generic plan even the best option all the time? Wouldn't it be better to adopt a similar approach to what plpgsql does, where it tries to test if using a generic plan is beneficial? The non-inlineable SQL functions suffer a big performance hit for a large number of partitions, because they cannot rely on static planning-time pruning.
3) What could be causing the big performance difference between case 7 (simple SELECT) and 8 (simple SELECT with ORDER BY <index> LIMIT 1)? For 4096 partitions, TPS of 7) is around 5, while adding the ORDER BY <index> LIMIT 1 makes TPS drop well below 1. In theory, run-time pruning of the right chunk should take exactly the same amount of time in both cases, because both are pruning timestamp now() on the same number of partitions. The resulting plans are also identical with the exception of the top LIMIT-node (in PG11 they differ slightly as a MergeAppend is chosen for the ORDER BY instead of an Append, in HEAD with ordered append this is not necessary anymore). Am I missing something here?
4) A more general question about run-time pruning in nested loops, like the one for case 14. I believe I read in one of the previous threads that run-time pruning only reoccurs if it determines that the value that determines which partitions must be excluded has changed in between iterations. How is this defined? Eg. let's say partitions are 1-day wide and the first iteration of the loop filters on the partitioned table for timestamp between 14-04-2019 12:00 and 14-04-2019 20:00 (dynamically determined). Then the second iteration comes along and now filters on values between 14-04-2019 12:00 and 14-04-2019 19:00. The partition that should be scanned hasn't changed, because both timestamps fall into the same partition. Is the full process of run-time pruning applied again, or is there some kind of shortcut that first checks if the previous pruning result is still valid even if the value has changed slightly? If not, would this be a possible optimization, as I think it's a case that occurs very often? I don't know the run-time pruning code very well though, so it may just be a crazy idea that can't be practically achieved.

There was one other thing I noticed, and I believe it was raised by Tom in a separate thread as well: the setup code itself is really slow. Creating of partitions is taking a long time (it's taking several minutes to create the 4096 partition table).

Thanks again for the great work on partitioning! Almost every case that I tested is way better than the comparable case in PG11.

-Floris

[1] https://github.com/timescale/timescaledb/issues/1154#issuecomment-482347314?

Attachment Content-Type Size
benchmark.sql.txt text/plain 5.6 KB
results.txt text/plain 89.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-14 19:29:26 Re: hyrax vs. RelationBuildPartitionDesc
Previous Message Laurenz Albe 2019-04-14 18:15:09 Re: Identity columns should own only one sequence