Re: Batch insert in CTAS/MatView code

From: Andres Freund <andres(at)anarazel(dot)de>
To: Paul Guo <pguo(at)pivotal(dot)io>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Asim R P <apraveen(at)pivotal(dot)io>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Taylor Vesely <tvesely(at)pivotal(dot)io>
Subject: Re: Batch insert in CTAS/MatView code
Date: 2019-09-30 07:38:02
Message-ID: 20190930073802.t3idbremgyuklvuf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-09-30 12:12:31 +0800, Paul Guo wrote:
> > > > However, I can also see that there is no better alternative. We need
> > to
> > > > compute the size of accumulated tuples so far, in order to decide
> > whether
> > > > to stop accumulating tuples. There is no convenient way to obtain the
> > > > length of the tuple, given a slot. How about making that decision
> > solely
> > > > based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple
> > call
> > > > altogether?
> > >
> > > ... maybe we should add a new operation to slots, that returns the
> > > (approximate?) size of a tuple?
> >
> > Hm, I'm not convinced that it's worth adding that as a dedicated
> > operation. It's not that clear what it'd exactly mean anyway - what
> > would it measure? As referenced in the slot? As if it were stored on
> > disk? etc?
> >
> > I wonder if the right answer wouldn't be to just measure the size of a
> > memory context containing the batch slots, or something like that.
> >
> >
> Probably a better way is to move those logic (append slot to slots, judge
> when to flush, flush, clean up slots) into table_multi_insert()?

That does not strike me as a good idea. The upper layer is going to need
to manage some resources (e.g. it's the only bit that knows about how to
manage lifetime of the incoming data), and by exposing it to each AM
we're going to duplicate the necessary code too.

> Generally the final implementation of table_multi_insert() should be
> able to know the sizes easily. One concern is that currently just COPY
> in the repo uses multi insert, so not sure if other callers in the
> future want their own logic (or set up a flag to allow customization
> but seems a bit over-designed?).

And that is also a concern, it seems unlikely that we'll get the
interface good.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-09-30 07:39:59 Inconsistent usage of BACKEND_* symbols
Previous Message Michael Paquier 2019-09-30 07:21:38 Re: Skip recovery/standby signal files in pg_basebackup