Re: COPY FROM WHEN condition

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2019-03-28 07:48:47
Message-ID: CAKJS1f8VqJt4obvFyZXCHbAKz0E9xraic-sbtEtORUYG6wWbrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 27 Mar 2019 at 18:49, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Here's a version that applies onto HEAD. It needs a bit more cleanup
> (I'm not sure I like the addition of ri_batchInsertSlots to store slots
> for each partition, there's some duplicated code around slot array and
> slot creation, docs for the multi insert callback).
>
> When measuring inserting into unlogged partitions (to remove the
> overhead of WAL logging, which makes it harder to see the raw
> performance difference), I see a few percent gains of the slotified
> version over the current code. Same with insertions that have fewer
> partition switches.
>
> Sorry for posting this here - David, it'd be cool if you could take a
> look anyway. You've played a lot with this code.

Hi,

I had a look at this and performance has improved again, thanks.
However, I'm not sure if the patch is exactly what we need, let me
explain.

When I was working on 0d5f05cde01, I think my original idea was just
to use the bufferedTuples[] array and always multi insert into
partitioned tables unless something like on insert triggers stopped
that. The reason I ended up with all this CIM_MULTI and
CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
when the target partition changed on each tuple. At the time I
wondered if it might be worth trying to have multiple buffers or to
partition the existing buffer and store tuples belonging to different
partitions, then just flush them when they're full. Looking at the
patch it seems that you have added an array of slots per partition, so
does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
code redundant? ... we could just flush each partition's buffer when
it gets full. There's not much of a need to flush the buffer when the
partition changes since we're using a different buffer for the next
partition anyway.

I also wonder about memory usage here. If we're copying to a
partitioned table with 10k partitions and we get over
MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
quite a lot of slots.

So, in short. I think the patch either is going to cause possible
memory problems during COPY FROM, or it leaves code that is pretty
much redundant. ... Or I've just misunderstood the patch :)

Does this make sense?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2019-03-28 08:07:59 Re: REINDEX CONCURRENTLY 2.0
Previous Message Sergei Kornilov 2019-03-28 07:45:08 Re: Planning counters in pg_stat_statements (using pgss_store)