Re: Pluggable Storage - Andres's take

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable Storage - Andres's take
Date: 2018-12-15 08:16:25
Message-ID: CAJrrPGcJOKU-OmuXWSgiHq9wArny-4+QW7qW6K=+MZh=Pe2NYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 11, 2018 at 12:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> Thanks for these changes. I've merged a good chunk of them.
>

Thanks.

> On 2018-11-16 12:05:26 +1100, Haribabu Kommi wrote:
> > diff --git a/src/backend/access/heap/heapam_handler.c
> b/src/backend/access/heap/heapam_handler.c
> > index c3960dc91f..3254e30a45 100644
> > --- a/src/backend/access/heap/heapam_handler.c
> > +++ b/src/backend/access/heap/heapam_handler.c
> > @@ -1741,7 +1741,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc
> sscan, TransactionId OldestXmin, do
> > {
> > HeapScanDesc scan = (HeapScanDesc) sscan;
> > Page targpage;
> > - OffsetNumber targoffset = scan->rs_cindex;
> > + OffsetNumber targoffset;
> > OffsetNumber maxoffset;
> > BufferHeapTupleTableSlot *hslot;
> >
> > @@ -1751,7 +1751,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc
> sscan, TransactionId OldestXmin, do
> > maxoffset = PageGetMaxOffsetNumber(targpage);
> >
> > /* Inner loop over all tuples on the selected page */
> > - for (targoffset = scan->rs_cindex; targoffset <= maxoffset;
> targoffset++)
> > + for (targoffset = scan->rs_cindex ? scan->rs_cindex :
> FirstOffsetNumber;
> > + targoffset <= maxoffset;
> > + targoffset++)
> > {
> > ItemId itemid;
> > HeapTuple targtuple = &hslot->base.tupdata;
>
> I thought it was better to fix the initialization for rs_cindex - any
> reason you didn't go for that?
>

No specific reason. Thanks for the correction.

>
> > diff --git a/src/backend/access/heap/heapam_visibility.c
> b/src/backend/access/heap/heapam_visibility.c
> > index 8233475aa0..7bad246f55 100644
> > --- a/src/backend/access/heap/heapam_visibility.c
> > +++ b/src/backend/access/heap/heapam_visibility.c
> > @@ -1838,8 +1838,10 @@ HeapTupleSatisfies(HeapTuple stup, Snapshot
> snapshot, Buffer buffer)
> > case NON_VACUUMABLE_VISIBILTY:
> > return HeapTupleSatisfiesNonVacuumable(stup,
> snapshot, buffer);
> > break;
> > - default:
> > + case END_OF_VISIBILITY:
> > Assert(0);
> > break;
> > }
> > +
> > + return false; /* keep compiler quiet */
>
> I don't understand why END_OF_VISIBILITY is good idea? I now removed
> END_OF_VISIBILITY, and the default case.
>

OK.

> > @@ -593,6 +594,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver
> *self)
> > if (myState->rel->rd_rel->relhasoids)
> > slot->tts_tupleOid = InvalidOid;
> >
> > + /* Materialize the slot */
> > + if (!TTS_IS_VIRTUAL(slot))
> > + ExecMaterializeSlot(slot);
> > +
> > table_insert(myState->rel,
> > slot,
> > myState->output_cid,
>
> What's the point of adding materialization here?
>

In earlier testing i observed as the slot that is received is a buffered
slot
and it points to the original tuple, but when it inserts it into the new
table,
the transaction id changes and it leads to invisible tuple, because of that
reason I added the materialize here.

> > @@ -570,6 +563,9 @@ ExecInterpExpr(ExprState *state, ExprContext
> *econtext, bool *isnull)
> > Assert(TTS_IS_HEAPTUPLE(scanslot) ||
> > TTS_IS_BUFFERTUPLE(scanslot));
> >
> > + if (hslot->tuple == NULL)
> > + ExecMaterializeSlot(scanslot);
> > +
> > d = heap_getsysattr(hslot->tuple, attnum,
> >
> scanslot->tts_tupleDescriptor,
> >
> op->resnull);
>
> Same?
>
>
> > diff --git a/src/backend/executor/execMain.c
> b/src/backend/executor/execMain.c
> > index e055c0a7c6..34ef86a5bd 100644
> > --- a/src/backend/executor/execMain.c
> > +++ b/src/backend/executor/execMain.c
> > @@ -2594,7 +2594,7 @@ EvalPlanQual(EState *estate, EPQState *epqstate,
> > * datums that may be present in copyTuple). As with the next
> step, this
> > * is to guard against early re-use of the EPQ query.
> > */
> > - if (!TupIsNull(slot))
> > + if (!TupIsNull(slot) && !TTS_IS_VIRTUAL(slot))
> > ExecMaterializeSlot(slot);
>
>
> Same?
>

Earlier virtual tuple materialize was throwing error, because of that
reason I added
that check.

> index 56880e3d16..36ca07beb2 100644

> > --- a/src/backend/executor/nodeBitmapHeapscan.c
> > +++ b/src/backend/executor/nodeBitmapHeapscan.c
> > @@ -224,6 +224,18 @@ BitmapHeapNext(BitmapHeapScanState *node)
> >
> > BitmapAdjustPrefetchIterator(node, tbmres);
> >
> > + /*
> > + * Ignore any claimed entries past what we think
> is the end of the
> > + * relation. (This is probably not necessary
> given that we got at
> > + * least AccessShareLock on the table before
> performing any of the
> > + * indexscans, but let's be safe.)
> > + */
> > + if (tbmres->blockno >= scan->rs_nblocks)
> > + {
> > + node->tbmres = tbmres = NULL;
> > + continue;
> > + }
> > +
>
> I moved this into the storage engine, there just was a minor bug
> preventing the already existing check from taking effect. I don't think
> we should expose this kind of thing to the outside of the storage
> engine.
>

OK.

>
> > diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> > index 54382aba88..ea48e1d6e8 100644
> > --- a/src/backend/parser/gram.y
> > +++ b/src/backend/parser/gram.y
> > @@ -4037,7 +4037,6 @@ CreateStatsStmt:
> > *
> >
> *****************************************************************************/
> >
> > -// PBORKED: storage option
> > CreateAsStmt:
> > CREATE OptTemp TABLE create_as_target AS SelectStmt
> opt_with_data
> > {
> > @@ -4068,14 +4067,16 @@ CreateAsStmt:
> > ;
> >
> > create_as_target:
> > - qualified_name opt_column_list OptWith
> OnCommitOption OptTableSpace
> > + qualified_name opt_column_list
> table_access_method_clause
> > + OptWith OnCommitOption OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > $$->colNames = $2;
> > - $$->options = $3;
> > - $$->onCommit = $4;
> > - $$->tableSpaceName = $5;
> > + $$->accessMethod = $3;
> > + $$->options = $4;
> > + $$->onCommit = $5;
> > + $$->tableSpaceName = $6;
> > $$->viewQuery = NULL;
> > $$->skipData = false; /*
> might get changed later */
> > }
> > @@ -4125,14 +4126,15 @@ CreateMatViewStmt:
> > ;
> >
> > create_mv_target:
> > - qualified_name opt_column_list opt_reloptions
> OptTableSpace
> > + qualified_name opt_column_list
> table_access_method_clause opt_reloptions OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > $$->colNames = $2;
> > - $$->options = $3;
> > + $$->accessMethod = $3;
> > + $$->options = $4;
> > $$->onCommit = ONCOMMIT_NOOP;
> > - $$->tableSpaceName = $4;
> > + $$->tableSpaceName = $5;
> > $$->viewQuery = NULL; /*
> filled at analysis time */
> > $$->skipData = false; /*
> might get changed later */
> > }
>
> Cool. I wonder if we should also somehow support SELECT INTO w/ USING?
> You've apparently started to do so with?
>

I thought the same, but SELECT INTO is deprecated syntax, is it fine to add
the new syntax?

Regards,
Haribabu Kommi
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-12-15 09:59:35 Re: 'infinity'::Interval should be added
Previous Message Amit Kapila 2018-12-15 05:05:23 Re: WIP: Avoid creation of the free space map for small tables