From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Pluggable Storage - Andres's take |
Date: | 2018-09-04 00:33:08 |
Message-ID: | 20180904003308.r25cxramrfpvasxj@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Thanks for the patches!
On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
> I found couple of places where the zheap is using some extra logic in
> verifying
> whether it is zheap AM or not, based on that it used to took some extra
> decisions.
> I am analyzing all the extra code that is done, whether any callbacks can
> handle it
> or not? and how? I can come back with more details later.
Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.
> >> And then:
> >> - lotsa cleanups
> >> - rebasing onto a newer version of the abstract slot patchset
> >> - splitting out smaller patches
> >>
> >>
> >> You'd moved the bulk insert into tableam callbacks - I don't quite get
> >> why? There's not really anything AM specific in that code?
> >>
> >
> > The main reason of adding them to AM is just to provide a control to
> > the specific AM to decide whether they can support the bulk insert or
> > not?
> >
> > Current framework doesn't support AM specific bulk insert state to be
> > passed from one function to another and it's structure is fixed. This needs
> > to be enhanced to add AM specific private members also.
> >
>
> Do you want me to work on it to make it generic to AM methods to extend
> the structure?
I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...
> @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
> CommandId mycid, int hi_options,
> ResultRelInfo *resultRelInfo,
> BulkInsertState bistate,
> - int nBufferedTuples, TupleTableSlot **bufferedSlots,
> + int nBufferedSlots, TupleTableSlot **bufferedSlots,
> uint64 firstBufferedLineNo);
> static bool CopyReadLine(CopyState cstate);
> static bool CopyReadLineText(CopyState cstate);
> @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
> void *bistate;
> uint64 processed = 0;
> bool useHeapMultiInsert;
> - int nBufferedTuples = 0;
> + int nBufferedSlots = 0;
> int prev_leaf_part_index = -1;
> -#define MAX_BUFFERED_TUPLES 1000
> +#define MAX_BUFFERED_SLOTS 1000
What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.
> if (useHeapMultiInsert)
> {
> + int tup_size;
> +
> /* Add this tuple to the tuple buffer */
> - if (nBufferedTuples == 0)
> + if (nBufferedSlots == 0)
> + {
> firstBufferedLineNo = cstate->cur_lineno;
> - Assert(bufferedSlots[nBufferedTuples] == myslot);
> - nBufferedTuples++;
> +
> + /*
> + * Find out the Tuple size of the first tuple in a batch and
> + * use it for the rest tuples in a batch. There may be scenarios
> + * where the first tuple is very small and rest can be large, but
> + * that's rare and this should work for majority of the scenarios.
> + */
> + tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
> + myslot->tts_values,
> + myslot->tts_isnull);
> + }
This seems too exensive to me. I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2018-09-04 00:34:21 | Re: CREATE ROUTINE MAPPING |
Previous Message | Iwata, Aya | 2018-09-04 00:29:09 | RE: libpq debug log |