Re: TupleTableSlot abstraction

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: TupleTableSlot abstraction
Date: 2018-10-09 15:16:04
Message-ID: CAJ3gD9eidD6DFeLLpu4QMpbk_XNVOOQKz8KtawwwuuY0DZwrew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 26 Sep 2018 at 05:35, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > +
> > +/*
> > + * This is a function used by all getattr() callbacks which deal with a heap
> > + * tuple or some tuple format which can be represented as a heap tuple e.g. a
> > + * minimal tuple.
> > + *
> > + * heap_getattr considers any attnum beyond the attributes available in the
> > + * tuple as NULL. This function however returns the values of missing
> > + * attributes from the tuple descriptor in that case. Also this function does
> > + * not support extracting system attributes.
> > + *
> > + * If the attribute needs to be fetched from the tuple, the function fills in
> > + * tts_values and tts_isnull arrays upto the required attnum.
> > + */
> > +Datum
> > +tts_heap_getattr_common(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
> > + int attnum, bool
> > *isnull)
>
> I'm still *vehemently* opposed to the introduction of this.

You mean, you want to remove the att_isnull() optimization, right ?
Removed that code now. Directly deforming the tuple regardless of the
null attribute.

>
>
>
> > @@ -2024,7 +2024,18 @@ FormIndexDatum(IndexInfo *indexInfo,
> > Datum iDatum;
> > bool isNull;
> >
> > - if (keycol != 0)
> > + if (keycol < 0)
> > + {
> > + HeapTupleTableSlot *hslot = (HeapTupleTableSlot *)slot;
> > +
> > + /* Only heap tuples have system attributes. */
> > + Assert(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot));
> > +
> > + iDatum = heap_getsysattr(hslot->tuple, keycol,
> > + slot->tts_tupleDescriptor,
> > + &isNull);
> > + }
> > + else if (keycol != 0)
> > {
> > /*
> > * Plain index column; get the value we need directly from the
>
> This now should access the system column via the slot, right? There's
> other places like this IIRC.

Done. In FormIndexDatum() and ExecInterpExpr(), directly calling
slot_getsysattr() now.

In ExecInterpExpr (), I am no longer using ExecEvalSysVar() now. I am
planning to remove this definition since it would be a single line
function just calling slot_getsysattr().

In build_ExecEvalSysVar(), ExecEvalSysVar() is still used, so I
haven't removed the definition yet. I am planning to create a new
LLVMValueRef FuncSlotGetsysattr, and use that instead, in
build_ExecEvalSysVar(), or for that matter, I am thinking to revert
back build_ExecEvalSysVar() and instead have that code inline as in
HEAD.

>
>
>
> > diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> > index 9d6e25a..1b4e726 100644
> > --- a/src/backend/executor/execExprInterp.c
> > +++ b/src/backend/executor/execExprInterp.c
> > @@ -490,54 +490,21 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
> >
> > EEO_CASE(EEOP_INNER_SYSVAR)
> > {
> > - int attnum = op->d.var.attnum;
> > - Datum d;
> > -
> > - /* these asserts must match defenses in slot_getattr */
> > - Assert(innerslot->tts_tuple != NULL);
> > - Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr));
> > -
> > - /* heap_getsysattr has sufficient defenses against bad attnums */
> > - d = heap_getsysattr(innerslot->tts_tuple, attnum,
> > - innerslot->tts_tupleDescriptor,
> > - op->resnull);
> > - *op->resvalue = d;
> > + ExecEvalSysVar(state, op, econtext, innerslot);
>
> These changes should be in a separate commit.

As mentioned above, now I am not using ExecEvalSysVar(), instead, I am
calling slot_getsysattr(). So no need of separate commit for that.

>
>
> > +const TupleTableSlotOps TTSOpsHeapTuple = {
> > + sizeof(HeapTupleTableSlot),
> > + .init = tts_heap_init,
>
> The first field should also use a named field (same in following cases).

Done.

>
>
> > @@ -185,6 +1020,7 @@ ExecResetTupleTable(List *tupleTable, /* tuple table */
> > {
> > TupleTableSlot *slot = lfirst_node(TupleTableSlot, lc);
> >
> > + slot->tts_cb->release(slot);
> > /* Always release resources and reset the slot to empty */
> > ExecClearTuple(slot);
> > if (slot->tts_tupleDescriptor)
> > @@ -240,6 +1076,7 @@ void
> > ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
> > {
> > /* This should match ExecResetTupleTable's processing of one slot */
> > + slot->tts_cb->release(slot);
> > Assert(IsA(slot, TupleTableSlot));
> > ExecClearTuple(slot);
> > if (slot->tts_tupleDescriptor)
>
> ISTM that release should be called *after* clearing the slot.

I am copying here what I discussed about this in the earlier reply:

I am not sure what was release() designed to do. Currently all of the
implementers of this function are empty. Was it meant for doing
ReleaseTupleDesc(slot->tts_tupleDescriptor) ? Or
ReleaseBuffer(bslot->buffer) ? I think the purpose of keeping this
*before* clearing the tuple might be because the clear() might have
already cleared some handles that release() might need.

>
>
> > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
> > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
> > HeapTuple tuple;
> > shm_mq_result result;
> > + bool tuple_copied = false;
> > +
> > + /* Get the tuple out of slot, if necessary converting the slot's contents
> > + * into a heap tuple by copying. In the later case we need to free the copy.
> > + */
> > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> > + {
> > + tuple = ExecFetchSlotTuple(slot, true);
> > + tuple_copied = false;
> > + }
> > + else
> > + {
> > + tuple = ExecCopySlotTuple(slot);
> > + tuple_copied = true;
> > + }
>
> This seems like a bad idea to me. We shouldn't hardcode slots like
> this. I've previously argued that we should instead allow
> ExecFetchSlotTuple() for all types of tuples, but add a new bool
> *shouldFree argument, which will then allow the caller to free the
> tuple. We gain zilch by having this kind of logic in multiple callers.

Done. See more comments below.

> > > @@ -56,11 +56,28 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
> > > TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
> > > HeapTuple tuple;
> > > shm_mq_result result;
> > > + bool tuple_copied = false;
> > > +
> > > + /* Get the tuple out of slot, if necessary converting the slot's contents
> > > + * into a heap tuple by copying. In the later case we need to free the copy.
> > > + */
> > > + if (TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot))
> > > + {
> > > + tuple = ExecFetchSlotTuple(slot, true);
> > > + tuple_copied = false;
> > > + }
> > > + else
> > > + {
> > > + tuple = ExecCopySlotTuple(slot);
> > > + tuple_copied = true;
> > > + }
> >
> > This seems like a bad idea to me. We shouldn't hardcode slots like
> > this. I've previously argued that we should instead allow
> > ExecFetchSlotTuple() for all types of tuples, but add a new bool
> > *shouldFree argument, which will then allow the caller to free the
> > tuple. We gain zilch by having this kind of logic in multiple callers.
>
> How about having a separate ExecFetchSlotHeapTuple() for many of the
> callers where it is known that the tuple is a heap/buffer tuple ? And
> in rare places such as above where slot type is not known, we can have
> ExecFetchSlotTuple() which would have an extra shouldFree parameter.

I went ahead and did these changes, but for now, I haven't replaced
ExecFetchSlotTuple() with ExecFetchSlotHeapTuple(). Instead, I
retained ExecFetchSlotTuple() to be called for heap tuples, and added
a new ExecFetchGenericSlotTuple() to be used with shouldFree. I don't
like these names, but until we have concluded, I don't want to go
ahead and replace all the numerous ExecFetchSlotTuple() calls with
ExecFetchSlotHeapTuple().

Have used ExecFetchGenericSlotTuple() for ExecBRInsertTriggers() and
ExecIRInsertTriggers() as well, because in CopyFrom(), now with
partition mps, the input slot to these functions can be a virtual
slot.

> > rebasing with latest master, in the regression I got crashes, and I
> > suspect the reason being that I have used Virtual tuple slot for the
> > destination slot of execute_attr_map_slot(). I am analyzing it. I am
> > anyway attaching the patches (v12) to give you an idea of how I have
> > handled the below two items.
>
> It seems to be some corruption here :
>
> @@ -956,17 +978,39 @@ ExecUpdate(ModifyTableState *mtstate,
>
> - tuple = ExecMaterializeSlot(slot);
>
> + if (!(TTS_IS_HEAPTUPLE(slot) || TTS_IS_BUFFERTUPLE(slot)))
> + {
> + TupleTableSlot *es_slot = estate->es_dml_input_tuple_slot;
> +
> + Assert(es_slot && TTS_IS_HEAPTUPLE(es_slot));
> + if (es_slot->tts_tupleDescriptor != slot->tts_tupleDescriptor)
> + ExecSetSlotDescriptor(es_slot, slot->tts_tupleDescriptor);
> + ExecCopySlot(es_slot, slot);
> + slot = es_slot;
> + }
> +
> + tuple = ExecFetchSlotTuple(slot, true);
>
> After the slotification of partition tuple conversion, the input slot
> is a virtual tuple, and the above code seems to result in some
> corruption which I have not finished analyzing. It only happens for
> INSERT ON CONFLICT case with partitions.

In ExecInsert() and ExecUpdate(), when it's a partition with different
column order, the input slot is a slot returned by
execute_attr_map_slot(), i.e. a virtual slot. So, in ExecUpdate() the
input slot->tts_values[] datums may even point to existing contents of
estate->es_dml_input_tuple_slot. This happens when the set clause of
UPDATE uses excluded.* columns :

(From insert_conflict.sql)
insert into parted_conflict_test values (3, 'b')
on conflict (a) do update set b = excluded.b;

And ExecSetSlotDescriptor() is called *before* extracting the tuple
out of the virtual slot, so this function clears the tuple of
es_dml_input_tuple_slot, and so the input slot datums pointing to this
slot tuple become dangling pointers.

Basically, we should extract the tuple out of the input slot as early
as possible.

I now have a new function ExecCopySlotWithTupDesc() which extracts the
tuple using dstslot's memory context, and only then changes the tuple
descriptor. Effectively, the dstlot has a materialized tuple.

Used this function for both ExecUpdate() and ExecInsert().

On Fri, 5 Oct 2018 at 20:28, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > From 06d31f91831a1da78d56e4217f08d3866c7be6f8 Mon Sep 17 00:00:00 2001
> > From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
> > Date: Fri, 31 Aug 2018 11:00:30 +0530
> > Subject: [PATCH 09/14] Rethink ExecFetchSlotMinimalTuple()
> >
> > Before this work, a TupleTableSlot could "own" a minimal tuple as
> > well. Thus ExecFetchSlotMinimalTuple() returned a minimal tuple after
> > constructing and "owning" it if it was not readily available. When the
> > slot "owned" the minimal tuple, the memory consumed by the tuple was
> > freed when a new tuple was stored in it or the slot was cleared.
> >
> > With this work, not all TupleTableSlot types can "own" a minimal
> > tuple. When fetching a minimal tuple from a slot that can not "own" a
> > tuple, memory is allocated to construct the minimal tuple, which needs
> > to be freed explicitly. Hence ExecFetchSlotMinimalTuple() is modified
> > to flag the caller whether it should free the memory consumed by the
> > returned minimal tuple.
> >
> > Right now only a MinimalTupleTableSlot can own a minimal tuple. But we
> > may change that in future or a user defined TupleTableSlot type (added
> > through an extension) may be able to "own" a minimal tuple in it.
> > Hence instead of relying upon TTS_IS_MINIMAL() macro to tell us
> > whether the TupleTableSlot can "own" a minimal tuple or not, we rely
> > on the set of callbacks. A TupleTableSlot which can hold a minimal
> > tuple implements get_minimal_tuple callback. Other TupleTableSlot
> > types leave the callback NULL.
> >
> > The minimal tuple returned by this function is usually copied into a
> > hash table or a file. But, unlike ExecFetchSlotTuple() it's never
> > written to. Hence the difference between signatures of the two
> > functions.
>
> I'm somewhat inclined to think this should be done in an earlier patch,
> before the main "abstract slot" work.

Yet to do this.

There is still one more regression test failure in polygon.sql which I
am yet to analyze.

Also, there are some more pending points of yours that I haven't addressed yet.

Attached is v13 patch series.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_abstract_tts_patches_v13.tar.gz application/x-gzip 66.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-10-09 15:16:14 Re: executor relation handling
Previous Message Michael Banck 2018-10-09 15:14:50 Re: TAP tests for pg_verify_checksums