Re: COPY FROM WHEN condition

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2019-01-29 07:18:46
Message-ID: CAKJS1f91By42p=oQHOF+n-dAyVTYGYGjHK2qg74xL=5=6MC9ig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.

I agree that fixing the problem mentioned by separating out tuple and
batch contexts is a good idea, but I disagree with removing the
alternating batch context idea. FWIW the reason the alternating
contexts went in wasn't just for fun, it was on performance grounds.
There's a lengthy discussion in [1] about not adding any new
performance regressions to COPY. To be more specific, Peter complains
about a regression of 5% in [2].

It's pretty disheartening to see the work there being partially undone.

Here are my performance tests of with and without your change to the
memory contexts (I missed where you posted your results).

$ cat bench.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n2\n2\n";
}
36a1281f86c: (with your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 26825.142 ms (00:26.825)
Time: 27202.117 ms (00:27.202)
Time: 26266.705 ms (00:26.267)

4be058fe9ec: (before your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 25645.460 ms (00:25.645)
Time: 25698.193 ms (00:25.698)
Time: 25737.251 ms (00:25.737)

So looks like your change slows this code down 4% for this test case.
That's about twice as bad as the 2.2% regression that I had to solve
for the multi-insert partition patch (of which you've removed much of
the code from)

I'd really like to see the alternating batch context code being put
back in to fix this. Of course, if you have a better idea, then we can
look into that, but just removing code that was carefully written and
benchmarked without any new benchmarks to prove that it's okay to do
so seems wrong to me.

[1] https://www.postgresql.org/message-id/flat/CAKJS1f93DeHN%2B9RrD9jYn0iF_o89w2B%2BU8-Ao5V1kd8Cf7oSGQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/2b40468d-6be0-e795-c363-0015bc9fa747%402ndquadrant.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takashi Menjo 2019-01-29 08:46:07 RE: static global variable openLogOff in xlog.c seems no longer used
Previous Message Amit Langote 2019-01-29 06:42:43 Re: Delay locking partitions during query execution