|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>|
|Cc:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: COPY FROM WHEN condition|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> I had a look at this and performance has improved again, thanks.
> However, I'm not sure if the patch is exactly what we need, let me
I'm not entirely sure either, I just haven't really seen an alternative
> When I was working on 0d5f05cde01, I think my original idea was just
> to use the bufferedTuples array and always multi insert into
> partitioned tables unless something like on insert triggers stopped
> that. The reason I ended up with all this CIM_MULTI and
> CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
> when the target partition changed on each tuple. At the time I
> wondered if it might be worth trying to have multiple buffers or to
> partition the existing buffer and store tuples belonging to different
> partitions, then just flush them when they're full. Looking at the
> patch it seems that you have added an array of slots per partition, so
> does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
> code redundant? ... we could just flush each partition's buffer when
> it gets full. There's not much of a need to flush the buffer when the
> partition changes since we're using a different buffer for the next
> partition anyway.
Well, there is in a way - that'd lead to pretty significant memory
usage if the tuples are a bit wider. I think there's a separate
optimization, where we keep track of the partitions with unflushed
changes and track the buffer size across all partitions. But that seems
like a followup patch.
> I also wonder about memory usage here. If we're copying to a
> partitioned table with 10k partitions and we get over
> MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
> quite a lot of slots.
We do - but they're not actually that large. In comparison to the
catcache, relcache, partition routing etc information per partition it's
not a huge amount of memory.
I'd earlier just destroyed the slots on partition change, but that's not
exactly free either...
|Next Message||Andrew Dunstan||2019-03-28 12:24:57||Re: jsonpath|
|Previous Message||Adrien NAYRAT||2019-03-28 12:14:39||Re: Log a sample of transactions|