Re: Making "COPY partitioned_table FROM" faster

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, khuddleston(at)pivotal(dot)io
Subject: Re: Making "COPY partitioned_table FROM" faster
Date: 2018-07-30 23:51:15
Message-ID: CAKJS1f_DDk0LPYGAJ-02tE5--J4-XJFV_yrAy-dySdjDbg3xkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 July 2018 at 06:21, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 30/07/2018 15:26, David Rowley wrote:
>>> - Add some tests. The if (nBufferedTuples > 0) that flushes the tuples
>>> when the partition changes is not currently exercised.
>>
>> That seems like a good idea. In fact, it uncovered a bug around
>> ConvertPartitionTupleSlot() freeing the previously stored tuple out
>> the slot which resulted in a crash. I didn't notice before because my
>> test had previously not required any tuple conversions.
>
> I think we need to think of a better place to put that temporary file,
> and clean it up properly afterwards. I'm not sure whether we have
> existing uses like that.

Ideally, I could have written the test in copy2.sql, but since I had
to insert over 1000 rows to trigger the multi-insert code, copy from
stdin was not practical in the .sql file. Instead, I just followed the
lead in copy.source and used the same temp file style as the previous
test. It also leaves that file laying around, it just happens to be
smaller. I've updated the test to truncate the table again and
perform another COPY TO to empty the file. I didn't see any way to
remove the file due to lack of standard way of removing files in the
OS. \! rm ... is not going to work on windows, for example.

I also failed to realise how the .source files work previously. I see
there are input and output directories. I'd missed updating the output
one in the v5 delta patch.

> Also, maybe the test should check afterwards that the right count of
> rows ended up in each partition?

Agreed. I've added that.

The attached v6 delta replaces the v5 delta and should be applied on
top of the full v4 patch. I'm using deltas in the hope they're easier
to review the few changes than reviewing the entire patch again.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v6_multiinsert_copy_delta.patch application/octet-stream 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-07-31 00:02:45 Re: [HACKERS] Parallel Append implementation
Previous Message Tom Lane 2018-07-30 23:42:00 Re: make installcheck-world in a clean environment