Re: Making "COPY partitioned_table FROM" faster

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, khuddleston(at)pivotal(dot)io
Subject: Re: Making "COPY partitioned_table FROM" faster
Date: 2018-07-26 02:30:47
Message-ID: CAKJS1f_qk1AAzLiKtHatzkbD-nE0PYn4y-Nt98h2VjTC9K=oDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 July 2018 at 04:37, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I don't see any need here for another GUC, nor even a command option.
> The user has already indicated their use case to us:

I agree.

> We know that the common case for RANGE partitioned tables is to load
> into the one current partition. We also know that the partition might
> change as we load data, when the data loaded crosses the partition
> boundary, so we need this optimization to be adaptive. Not all data
> loads follow that rule, so we also need the adpative algorithm to shut
> off for those cases.
>
> We also know that the common case for HASH partitions is to load into
> all partitions at once, since hash is specifically designed to spread
> data out across partitions. It is almost never true that we would want
> to load one partition at a time, so it seems easy to turn the
> optimization off if we use this type of partitioning. Or better, we
> need work done to improve that case also, but that is outside the
> current scope of this patch.
>
> If we have multi-level partitions, if any of the levels includes a
> Hash, then turn it off.
>
> LIST partitions are less likely to have a clear pattern, so I would
> treat them like HASH and assume the data is not sorted by partition.
>
> So for this patch, just add an "if (RANGE)" test.

I agree RANGE partition is probably the most likely case to benefit
from this optimisation, but I just don't think that HASH could never
benefit and LIST probably sits somewhere in the middle.

HASH partitioning might be useful in cases like partitioning by
"sensor_id". It does not seem that unreasonable that someone might
want to load all the data for an entire sensor at once.

The v3 version of the patch also fixes the very small performance
regression for the (probably quite likely) worst-case situation. New
performance is about 3.5% faster instead of 0.5-1% slower. So likely
there's no longer any need to consider this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-07-26 02:42:31 Re: PartitionDispatch's partdesc field
Previous Message Amit Kapila 2018-07-26 02:25:44 Re: Explain buffers wrong counter with parallel plans