Re: [HACKERS] UPDATE of partition key

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: [HACKERS] UPDATE of partition key
Date: 2018-01-09 17:37:15
Message-ID: CA+TgmoahUzX5+kTWPzSAs__+YDnKoA-HAXMuX1Unr_u-i5EjAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 4, 2018 at 1:18 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> ------------------
> 1. ExecUpdate() needs to revert back tcs_map value changed by ExecInsert()
> ------------------
>
>>> It seems like the ON CONFLICT stuff handled that by adding a
>>> second TransitionCaptureState pointer to ModifyTable, thus
>>> mt_transition_capture and mt_oc_transition_capture. By that
>>> precedent, we could add mt_utr_transition_capture or similar, and
>>> maybe that's the way to go. It seems a bit unsatisfying, but so does
>>> what you have now.
>>
>> In case of ON CONFLICT, if there are both INSERT and UPDATE statement
>> triggers referencing transition tables, both of the triggers need to
>> independently populate their own transition tables, and hence the need
>> for two separate transition states : mt_transition_capture and
>> mt_oc_transition_capture. But in case of update-tuple-routing, the
>> INSERT statement trigger won't come into picture. So the same
>> mt_transition_capture can serve the purpose of populating the
>> transition table with OLD and NEW rows. So I think it would be too
>> redundant, if not incorrect, to have a whole new transition state for
>> update tuple routing.
>>
>> I will see if it turns out better to have two tcs_maps in
>> TransitionCaptureState, one for update and one for insert. But this,
>> on first look, does not look good.
>
> Suppose TransitionCaptureState has separate maps, upd_del_tcs_maps and
> insert_tcs_maps for UPDATE/DELETE and INSERT events respectively.

That's not what I suggested. If you look at what I wrote, I floated
the idea of having two TransitionCaptureStates, not two separate maps
within the same TransitionCaptureState.

> ------------------
> 2. mt_childparent_tupconv_maps is indexed by subplan or partition leaf index.
> ------------------
> ------------------
> 3. Renaming of mt_transition_tupconv_maps to mt_childparent_tupconv_maps
> ------------------
>
> We need to change it's name because now this map is not only used for
> transition capture, but also for update-tuple-routing. Does it look ok
> for you if, for readability, we keep the childparent tag ? Or else, we
> can just make it "mt_tupconv_maps", but "mt_childparent_tupconv_maps"
> looks more informative.

I see your point: the array is being renamed because it now has more
than one purpose. But that's also what I'm complaining about with
regard to point #2: the same array is being used for more than one
purpose. That's generally bad style. If you have two loops in a
function, it's best to declare two separate loop variables rather than
reusing the same variable. This lets the compiler detect, for
example, an error where the second loop variable is used before it's
initialized, which would be undetectable if you reused the same
variable in both places. Although that particular benefit doesn't
pertain in this case, I maintain that having a single structure member
that is indexed one of two different ways is a bad idea.

If I understand correctly, the way we got here is that, in earlier
patch versions, you had two arrays of maps, but it wasn't clear why we
needed both of them, and David suggested replacing one of them with an
array of indexes instead, in the hopes of reducing confusion.
However, it looks to me like that didn't really work out. If we
always needed both maps, or even if we always needed the per-leaf map,
it would have been a good idea, but it seems here that we can need
either the per-leaf map or the per-subplan map or both or neither, and
we want to avoid computing all of the per-leaf conversion maps if we
only need per-subplan access.

I think one way to fix this might be to build the per-leaf maps on
demand. Just because we're doing UPDATE tuple routing doesn't
necessarily mean we'll actually need a TupleConversionMap for every
child. So we could allocate an array with one byte per leaf, where 0
means we don't know whether tuple conversion is necessary, 1 means it
is not, and 2 means it is, or something like that. Then we have a
second array with conversion maps. We provide a function
tupconv_map_for_leaf() or similar that checks the array; if it finds
1, it returns NULL; if it finds 2, it returns the conversion map
previously calculated. If it finds 0, it calls convert_tuples_by_name,
caches the result for later, updates the one-byte-per-leaf array with
the appropriate value, and returns the just-computed conversion map.
(The reason I'm suggesting 0/1/2 instead of just true/false is to
reduce cache misses; if we find a 1 in the first array we don't need
to access the second array at all.)

If that doesn't seem like a good idea for some reason, then my second
choice would be to leave mt_transition_tupconv_maps named the way it
is currently and have a separate mt_update_tupconv_maps, with the two
pointing, if both are initialized and as far as possible, to the same
TupleConversionMap objects.

> -------------------
> 4. Explicit signaling for "we are only here for transition tables"
> -------------------
>
> I had given a thought on this earlier. I felt, even the pre-existing
> conditions like "!trigdesc->trig_update_after_row" are all indirect
> ways to determine that this function is called only to capture
> transition tables, and thought that it may have been better to have
> separate parameter transition_table_only.

I see your point. I guess it's not really this patch's job to solve
this problem, although I think this is going to need some refactoring
in the not-too-distant future. So I think the way you did it is
probably OK.

> Instead of adding another parameter to AfterTriggerSaveEvent(), I had
> also considered another approach: Put the transition-tuples-capture
> logic part of AfterTriggerSaveEvent() into a helper function
> CaptureTransitionTables(). In ExecInsert() and ExecDelete(), instead
> of calling ExecARUpdateTriggers(), call this function
> CaptureTransitionTables(). I then dropped this idea and thought rather
> to call ExecARUpdateTriggers() which neatly does the required checks
> and other things like locking the old tuple via GetTupleForTrigger().
> So if we go by CaptureTransitionTables(), we would need to do what
> ExecARUpdateTriggers() does before calling CaptureTransitionTables().
> This is doable. If you think this is worth doing so as to get rid of
> the "(oldtup == NULL) ^ (newtup == NULL)" condition, we can do that.

Duplicating logic elsewhere to avoid this problem here doesn't seem
like a good plan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-09 17:49:11 Re: pgsql: Implement channel binding tls-server-end-point for SCRAM
Previous Message Antonin Houska 2018-01-09 16:48:25 Re: [HACKERS] Secondary index access optimizations