From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: COPY FROM WHEN condition |
Date: | 2019-03-28 12:15:51 |
Message-ID: | 20190328121551.hn7r3e2cnazjl2z2@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> I had a look at this and performance has improved again, thanks.
> However, I'm not sure if the patch is exactly what we need, let me
> explain.
I'm not entirely sure either, I just haven't really seen an alternative
that's convincing.
> When I was working on 0d5f05cde01, I think my original idea was just
> to use the bufferedTuples[] array and always multi insert into
> partitioned tables unless something like on insert triggers stopped
> that. The reason I ended up with all this CIM_MULTI and
> CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
> when the target partition changed on each tuple. At the time I
> wondered if it might be worth trying to have multiple buffers or to
> partition the existing buffer and store tuples belonging to different
> partitions, then just flush them when they're full. Looking at the
> patch it seems that you have added an array of slots per partition, so
> does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
> code redundant? ... we could just flush each partition's buffer when
> it gets full. There's not much of a need to flush the buffer when the
> partition changes since we're using a different buffer for the next
> partition anyway.
Well, there is in a way - that'd lead to pretty significant memory
usage if the tuples are a bit wider. I think there's a separate
optimization, where we keep track of the partitions with unflushed
changes and track the buffer size across all partitions. But that seems
like a followup patch.
> I also wonder about memory usage here. If we're copying to a
> partitioned table with 10k partitions and we get over
> MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
> quite a lot of slots.
We do - but they're not actually that large. In comparison to the
catcache, relcache, partition routing etc information per partition it's
not a huge amount of memory.
I'd earlier just destroyed the slots on partition change, but that's not
exactly free either...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2019-03-28 12:24:57 | Re: jsonpath |
Previous Message | Adrien NAYRAT | 2019-03-28 12:14:39 | Re: Log a sample of transactions |