Re: New Table Access Methods for Multi and Single Inserts

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2020-12-21 07:47:25
Message-ID: 20201221074725.GF30237@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote:
> On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > Are you thinking that TableInsertState would eventually have additional
> > > attributes which would apply to other tableams, but not to heap ? Is
> > > heap_insert_begin() really specific to heap ? It's allocating and populating a
> > > structure based on its arguments, but those same arguments would be passed to
> > > every other AM's insert_begin routine, too. Do you need a more flexible data
> > > structure, something that would also accomodate extensions? I'm thinking of
> > > reloptions as a loose analogy.
> >
> > I could not think of other tableam attributes now. But +1 to have that
> > kind of flexible structure for TableInsertState. So, it can have
> > tableam type and attributes within the union for each type.
>
> Right now you have heap_insert_begin(), and I asked if it was really
> heap-specific. Right now, it populates a struct based on a static list of
> arguments, which are what heap uses.
>
> If you were to implement a burp_insert_begin(), how would it differ from
> heap's? With the current API, they'd (have to) be the same, which means either
> that it should apply to all AMs (or have a "default" implementation that can be
> overridden by an AM), or that this API assumes that other AMs will want to do
> exactly what heap does, and fails to allow other AMs to implement optimizations
> for bulk inserts as claimed.
>
> I don't think using a "union" solves the problem, since it can only accommodate
> core AMs, and not extensions, so I suggested something like reloptions, which
> have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> toast.autovacuum_enabled).

I think you'd want to handle things like:

- a compressed AM wants to specify a threshold for a tuple's *compressed* size
(maybe in addition to the uncompressed size);
- a "columnar" AM wants to specify a threshold size for a column, rather
than for each tuple;

I'm not proposing to handle those specific parameters, but rather pointing out
that your implementation doesn't allow handling AM-specific considerations,
which I think was the goal.

The TableInsertState structure would need to store those, and then the AM's
multi_insert_v2 routine would need to make use of them.

It feels a bit like we'd introduce the idea of an "AM option", except that it
wouldn't be user-facing (or maybe some of them would be?). Maybe I've
misunderstood though, so other opinions are welcome.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-21 07:58:59 Re: Dependency isn't created between extension and schema
Previous Message Bharath Rupireddy 2020-12-21 07:42:07 Re: New Table Access Methods for Multi and Single Inserts