Re: Batch insert in CTAS/MatView code

From: Paul Guo <pguo(at)pivotal(dot)io>
To: Asim R P <apraveen(at)pivotal(dot)io>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Taylor Vesely <tvesely(at)pivotal(dot)io>
Subject: Re: Batch insert in CTAS/MatView code
Date: 2019-09-27 04:18:31
Message-ID: CAEET0ZH2o95dcRkB26a-hizFP4CXTmcLkPekxRD1kCno8nvYMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Asim Thanks for the review.

On Wed, Sep 25, 2019 at 6:39 PM Asim R P <apraveen(at)pivotal(dot)io> wrote:

>
>
>
> On Mon, Sep 9, 2019 at 4:02 PM Paul Guo <pguo(at)pivotal(dot)io> wrote:
> >
> > So in theory
> > we should not worry about additional tuple copy overhead now, and then I
> tried the patch without setting
> > multi-insert threshold as attached.
> >
>
> I reviewed your patch today. It looks good overall. My concern is that
> the ExecFetchSlotHeapTuple call does not seem appropriate. In a generic
> place such as createas.c, we should be using generic tableam API only.
> However, I can also see that there is no better alternative. We need to
> compute the size of accumulated tuples so far, in order to decide whether
> to stop accumulating tuples. There is no convenient way to obtain the
> length of the tuple, given a slot. How about making that decision solely
> based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple call
> altogether?
>

For heapam, ExecFetchSlotHeapTuple() will be called again in
heap_multi_insert() to prepare the final multi-insert. if we check
ExecFetchSlotHeapTuple(), we could find that calling it multiple time just
involves very very few overhead for the BufferHeapTuple case. Note for
virtual tuple case the 2nd ExecFetchSlotHeapTuple() call still copies slot
contents, but we've called ExecCopySlot(batchslot, slot); to copy to a
BufferHeap case so no worries for the virtual tuple case (as a source).

Previously (long ago) I probably understood the code incorrectly so had the
concern also. I used sampling to do that (for variable-length tuple), but
now apparently we do not need that.

>
> The multi insert copy code deals with index tuples also, which I don't see
> in the patch. Don't we need to consider populating indexes?
>

create table as/create mat view DDL does not involve index creation for the
table/matview. The code seems to be able to used in RefreshMatView also,
for that we need to consider if we use multi-insert in that code.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Guo 2019-09-27 04:22:50 Re: Batch insert in CTAS/MatView code
Previous Message Amit Kapila 2019-09-27 04:18:07 Re: range test for hash index?