Re: COPY FROM WHEN condition

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2019-03-28 12:15:51
Message-ID: 20190328121551.hn7r3e2cnazjl2z2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> I had a look at this and performance has improved again, thanks.
> However, I'm not sure if the patch is exactly what we need, let me
> explain.

I'm not entirely sure either, I just haven't really seen an alternative
that's convincing.

> When I was working on 0d5f05cde01, I think my original idea was just
> to use the bufferedTuples[] array and always multi insert into
> partitioned tables unless something like on insert triggers stopped
> that. The reason I ended up with all this CIM_MULTI and
> CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
> when the target partition changed on each tuple. At the time I
> wondered if it might be worth trying to have multiple buffers or to
> partition the existing buffer and store tuples belonging to different
> partitions, then just flush them when they're full. Looking at the
> patch it seems that you have added an array of slots per partition, so
> does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
> code redundant? ... we could just flush each partition's buffer when
> it gets full. There's not much of a need to flush the buffer when the
> partition changes since we're using a different buffer for the next
> partition anyway.

Well, there is in a way - that'd lead to pretty significant memory
usage if the tuples are a bit wider. I think there's a separate
optimization, where we keep track of the partitions with unflushed
changes and track the buffer size across all partitions. But that seems
like a followup patch.

> I also wonder about memory usage here. If we're copying to a
> partitioned table with 10k partitions and we get over
> MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
> quite a lot of slots.

We do - but they're not actually that large. In comparison to the
catcache, relcache, partition routing etc information per partition it's
not a huge amount of memory.

I'd earlier just destroyed the slots on partition change, but that's not
exactly free either...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-03-28 12:24:57 Re: jsonpath
Previous Message Adrien NAYRAT 2019-03-28 12:14:39 Re: Log a sample of transactions