Re: COPY FROM WHEN condition

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2019-01-29 09:51:04
Message-ID: b5fe1ea4-5007-2183-bd11-a8210cca3cd8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/29/19 8:18 AM, David Rowley wrote:
> On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> It turned out to be a tad more complex due to partitioning, because when
>> we find the partitions do not match, the tuple is already allocated in
>> the "current" context (be it per-tuple or batch). So we can't just free
>> the whole context at that point. The old code worked around this by
>> alternating two contexts, but that seems a bit too cumbersome to me, so
>> the patch simply copies the tuple to the new context. That allows us to
>> reset the batch context always, right after emptying the buffer. I need
>> to do some benchmarking to see if the extra copy causes any regression.
>
> I agree that fixing the problem mentioned by separating out tuple and
> batch contexts is a good idea, but I disagree with removing the
> alternating batch context idea. FWIW the reason the alternating
> contexts went in wasn't just for fun, it was on performance grounds.
> There's a lengthy discussion in [1] about not adding any new
> performance regressions to COPY. To be more specific, Peter complains
> about a regression of 5% in [2].
>
> It's pretty disheartening to see the work there being partially undone.
>

Whoops :-(

> Here are my performance tests of with and without your change to the
> memory contexts (I missed where you posted your results).
>
> $ cat bench.pl
> for (my $i=0; $i < 8912891; $i++) {
> print "1\n1\n2\n2\n";
> }
> 36a1281f86c: (with your change)
>
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 26825.142 ms (00:26.825)
> Time: 27202.117 ms (00:27.202)
> Time: 26266.705 ms (00:26.267)
>
> 4be058fe9ec: (before your change)
>
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 25645.460 ms (00:25.645)
> Time: 25698.193 ms (00:25.698)
> Time: 25737.251 ms (00:25.737)
>
> So looks like your change slows this code down 4% for this test case.
> That's about twice as bad as the 2.2% regression that I had to solve
> for the multi-insert partition patch (of which you've removed much of
> the code from)
>
> I'd really like to see the alternating batch context code being put
> back in to fix this. Of course, if you have a better idea, then we can
> look into that, but just removing code that was carefully written and
> benchmarked without any new benchmarks to prove that it's okay to do
> so seems wrong to me.
>

Yeah, that's not very nice. Do you suggest to revert 36a1281f86c0f, or
shall we try to massage it a bit first? ISTM we could simply create two
batch memory contexts and alternate those, just like with the expression
contexts before.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-01-29 10:11:15 Re: speeding up planning with partitions
Previous Message Magnus Hagander 2019-01-29 09:45:34 Re: pg_basebackup, walreceiver and wal_sender_timeout