Re: New Table Access Methods for Multi and Single Inserts

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2020-12-24 00:18:42
Message-ID: CALj2ACWMnZZCu=G0PJkEeYYicKeuJ-X=SU19i6vQ1+=uXz8u0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 21, 2020 at 1:17 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote:
> > On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > > Are you thinking that TableInsertState would eventually have additional
> > > > attributes which would apply to other tableams, but not to heap ? Is
> > > > heap_insert_begin() really specific to heap ? It's allocating and populating a
> > > > structure based on its arguments, but those same arguments would be passed to
> > > > every other AM's insert_begin routine, too. Do you need a more flexible data
> > > > structure, something that would also accomodate extensions? I'm thinking of
> > > > reloptions as a loose analogy.
> > >
> > > I could not think of other tableam attributes now. But +1 to have that
> > > kind of flexible structure for TableInsertState. So, it can have
> > > tableam type and attributes within the union for each type.
> >
> > Right now you have heap_insert_begin(), and I asked if it was really
> > heap-specific. Right now, it populates a struct based on a static list of
> > arguments, which are what heap uses.
> >
> > If you were to implement a burp_insert_begin(), how would it differ from
> > heap's? With the current API, they'd (have to) be the same, which means either
> > that it should apply to all AMs (or have a "default" implementation that can be
> > overridden by an AM), or that this API assumes that other AMs will want to do
> > exactly what heap does, and fails to allow other AMs to implement optimizations
> > for bulk inserts as claimed.
> >
> > I don't think using a "union" solves the problem, since it can only accommodate
> > core AMs, and not extensions, so I suggested something like reloptions, which
> > have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> > toast.autovacuum_enabled).
>
> I think you'd want to handle things like:
>
> - a compressed AM wants to specify a threshold for a tuple's *compressed* size
> (maybe in addition to the uncompressed size);
> - a "columnar" AM wants to specify a threshold size for a column, rather
> than for each tuple;
>
> I'm not proposing to handle those specific parameters, but rather pointing out
> that your implementation doesn't allow handling AM-specific considerations,
> which I think was the goal.
>
> The TableInsertState structure would need to store those, and then the AM's
> multi_insert_v2 routine would need to make use of them.
>
> It feels a bit like we'd introduce the idea of an "AM option", except that it
> wouldn't be user-facing (or maybe some of them would be?). Maybe I've
> misunderstood though, so other opinions are welcome.

Attaching a v2 patch for the new table AMs.

This patch has following changes:

1) Made the TableInsertState structure generic by having a void
pointer for multi insert state and defined the heap specific multi
insert state information in heapam.h. This way each AM can have it's
own multi insert state structure and dereference the void pointer
using that structure inside the respective AM implementations.
2) Earlier in the v1 patch, the bulk insert state
allocation/deallocation was moved to AM level, but I see that there's
nothing specific in doing so and I think it should be independent of
AM. So I'm doing that in table_insert_begin() and table_insert_end().
Because of this, I had to move the BulkInsert function declarations
from heapam.h to tableam.h
3) Corrected the typos and tried to adjust indentation of the code.

Note that I have not yet made the multi_insert_v2 API optional as
suggested earlier. I will think more on this and update.

I'm not posting the updated 0002 to 0004 patches, I plan to do so
after a couple of reviews happen on the design of the APIs in 0001.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v2-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch application/x-patch 20.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-12-24 01:07:54 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Alvaro Herrera 2020-12-24 00:14:18 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly