Re: Making "COPY partitioned_table FROM" faster

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, khuddleston(at)pivotal(dot)io
Subject: Re: Making "COPY partitioned_table FROM" faster
Date: 2018-07-26 17:30:22
Message-ID: CAAKRu_ZvbATWE_22mMdT6ms2cEOQR7018Au12O5Q8C_xKBjigA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 25, 2018 at 7:24 PM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

On patched code line 2564, there is a missing parenthesis. It might be
better to remove the parentheses entirely and, while you're at it, there is
a missing comma.

/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. (It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/

Should be

/*
* For partitioned tables we can't support multi-inserts when there
* are any statement level insert triggers. It might be possible to
* allow partitioned tables with such triggers in the future, but for
* now, CopyFromInsertBatch expects that any before row insert and
* statement level insert triggers are on the same relation.
*/

Regarding the performance improvement and code diff from v2 to v3:
The rearranging of the code in v3 makes sense and improves the flow of the
code, however, I stared at it for awhile and couldn't quite work out in my
head which part caused the significant improvement from v2.
I would have thought that a speedup from patch v2 to v3 would have come
from doing multi-inserts less often when the target partition is switching
a lot, however, looking at the v2 to v3 diff, I can't see how any of the
changes would have caused a decrease in the number of multi-inserts given
the same data and target table ddl.
So, I thinking I'm missing something. Which of the changes would cause the
performance improvement from patch v2 to v3? My understanding is:

a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
> test that sets leafpart_use_multi_insert.
>

This would short-circuit checking the other conditions when deciding how to
set leafpart_use_multi_insert

> b) Added an unlikely() when testing of the partition's resultRelInfo
> has set to be initialised. This is only true the first time a
> partition is inserted into during the COPY.
>

The addition of "unlikely" here seems like a good idea because there is a
function call (ExecInitPartitionInfo) inside the if statement. However,
would that cause such a difference in performance from v2 to v3? What would
be the reason? Cache misses? Some kind of pre-evaluation of the expression?

c) Added unlikely() around lastPartitionSampleLineNo test. This will
> only be true 1 in 1000, so pretty unlikely.
>

Even though this makes sense based on the frequency with which this
condition will evaluate to true, I don't understand what the performance
benefit would be for adding a compiler hint here around such a trivial
calculation

d) Moved the nBufferedTuples > 0 test under the insertMethod ==
> CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
> doing CIM_SINGLE.
>

This is a good catch. It makes this part of the code more clear, as well.
However, it doesn't seem like it would affect performance at all.

Let me know what I'm missing.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-26 17:35:16 Re: Online enabling of checksums
Previous Message Robert Haas 2018-07-26 17:03:39 Re: Online enabling of checksums