Re: COPY FROM WHEN condition

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2019-03-31 19:05:18
Message-ID: 20190331190518.saddzcl3memas32x@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-04-01 02:00:26 +1300, David Rowley wrote:
> On Fri, 29 Mar 2019 at 01:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> > > I had a look at this and performance has improved again, thanks.
> > > However, I'm not sure if the patch is exactly what we need, let me
> > > explain.
> >
> > I'm not entirely sure either, I just haven't really seen an alternative
> > that's convincing.
>
> I wonder if instead of having the array of slots in ResultRelInfo,
> have a struct that's local to copy.c containing the array and the
> number of tuples stored so far. For partitioned tables, we could
> store this struct in a hashtable by partition Oid. When the partition
> changes check if we've got this partition Oid in the hash table and
> keep adding tuples until the buffer fills. We could keep a global
> count of the number of tuple stored in all the slot arrays and flush
> all of them when it gets full.
>
> The trade-off here would be that instead of flushing on each partition
> change, we'd do a hash table lookup on each partition change and
> possibly create a new array of slots. This would allow us to get rid
> of the code that conditionally switches on/off the batching based on
> how often the partition is changing. The key to it being better would
> hang on the hash lookup + multi-row-inserts being faster than
> single-row-inserts.

It's not clear to me why this separate hashtable is useful /
necessary. We're essentially already doing such a lookup for the
partition routing - what's the point of having a repetition of the same
mapping? I don't see any benefit of storing the slots separately from
that.

> I'm just not too sure about how to handle getting rid of the slots
> when we flush all the tuples. Getting rid of them might be a waste,
> but it might also stop the code creating tens of millions of slots in
> the worst case. Maybe to fix that we could get rid of the slots in
> arrays that didn't get any use at all when we flush the tuples, as
> indicated by a 0 tuple count. This would require a hash seq scan, but
> maybe we could keep that cheap by flushing early if we get too many
> distinct partitions.

I'd suspect a better approach to this might be to just have a linked
list of partitions with slots in them, that'd avoid unnecessarily going
through all the partitions that got copied into them. I'm not convinced
this is necessary, but if I were to implement this, I'd leave the slots
in the ResultRelInfo, but additionally keep track of how many slots have
been created. When creating new slots, and some limit has been reached,
I'd just go to the front of that list and delete those slots (probably
delaying doing so to the the point of flushing pending insertions).
That seems like it'd not actually be that much changed code over my
current patch, and would avoid deleting slots in the common case.

I'll work on pushing all the other pending tableam patches today -
leaving COPY the last non pluggable part. You'd written in a private
email that you might try to work on this on Monday, so I think I'll give
this a shot on Tuesday if you've not gotten around till then? I'd like
to push this sooner than the exact end of the freeze...

> That would save the table from getting bloated if
> there happened to be a point in the copy stream where we saw high
> numbers of distinct partitions with just a few tuples each.
> Multi-inserts won't help much in that case anyway.

I thought your benchmarking showed that you saw benefits after even two
tuples?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-03-31 20:52:34 DWIM mode for psql
Previous Message Nikolay Shaplov 2019-03-31 19:01:03 Re: Ltree syntax improvement