Re: UPDATE of partition key

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-06-05 06:51:19
Message-ID: CAA4eK1JgPjKxfgw9GdJUowDA617KyCEj_VPhZTrCUxdWnwGBwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 1, 2017 at 3:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, May 29, 2017 at 5:26 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> But I think, we can also take step-by-step approach even for v11. If
>>> we agree that it is ok to silently do the updates as long as we
>>> document the behaviour, we can go ahead and do this, and then as a
>>> second step, implement error handling as a separate patch. If that
>>> patch does not materialize, we at least have the current behaviour
>>> documented.
>>
>> I think that is sensible approach if we find the second step involves
>> big or complicated changes.
>
> I think it is definitely a good idea to separate the two patches.
> UPDATE tuple routing without any special handling for the EPQ issue is
> just a partitioning feature. The proposed handling for the EPQ issue
> is an *on-disk format change*. That turns a patch which is subject
> only to routine bugs into one which can eat your data permanently --
> so having the "can eat your data permanently" separated out for both
> review and commit seems only prudent. For me, it's not a matter of
> which patch is big or complicated, but rather a matter of one of them
> being a whole lot riskier than the other. Even UPDATE tuple routing
> could mess things up pretty seriously if we end up with tuples in the
> wrong partition, of course, but the other thing is still worse.
>
> In terms of a development plan, I think we would need to have both
> patches before either could be committed. I believe that everyone
> other than me who has expressed an opinion on this issue has said that
> it's unacceptable to just ignore the issue, so it doesn't sound like
> there will be much appetite for having #1 go into the tree without #2.
> I'm still really concerned about that approach because we do not have
> very much bit space left and WARM wants to use quite a bit of it. I
> think it's quite possible that we'll be sad in the future if we find
> that we can't implement feature XYZ because of the bit-space consumed
> by this feature. However, I don't have the only vote here and I'm not
> going to try to shove this into the tree over multiple objections
> (unless there are a lot more votes the other way, but so far there's
> no sign of that).
>
> Greg/Amit's idea of using the CTID field rather than an infomask bit
> seems like a possibly promising approach. Not everything that needs
> bit-space can use the CTID field, so using it is a little less likely
> to conflict with something else we want to do in the future than using
> a precious infomask bit. However, I'm worried about this:
>
> /* Make sure there is no forward chain link in t_ctid */
> tp.t_data->t_ctid = tp.t_self;
>
> The comment does not say *why* we need to make sure that there is no
> forward chain link, but it implies that some code somewhere in the
> system does or at one time did depend on no forward link existing.
>

I think it is to ensure that EvalPlanQual mechanism gets invoked in
the right case. The visibility routine will return HeapTupleUpdated
both when the tuple is deleted or updated (updated - has a newer
version of the tuple), so we use ctid to decide if we need to follow
the tuple chain for a newer version of the tuple.

> Any such code that still exists will need to be updated.
>

Yeah.

> The other potential issue I see here is that I know the WARM code also
> tries to use the bit-space in the CTID field; in particular, it uses
> the CTID field of the last tuple in a HOT chain to point back to the
> root of the chain. That seems like it could conflict with the usage
> proposed here, but I'm not totally sure.
>

The proposed change in WARM tuple patch uses ip_posid field of CTID
and we are planning to use ip_blkid field. Here is the relevant text
and code from WARM tuple patch:

"Store the root line pointer of the WARM chain in the t_ctid.ip_posid
field of the last tuple in the chain and mark the tuple header with
HEAP_TUPLE_LATEST flag to record that fact."

+#define HeapTupleHeaderSetHeapLatest(tup, offnum) \
+do { \
+ AssertMacro(OffsetNumberIsValid(offnum)); \
+ (tup)->t_infomask2 |= HEAP_LATEST_TUPLE; \
+ ItemPointerSetOffsetNumber(&(tup)->t_ctid, (offnum)); \
+} while (0)

For further details, refer patch 0001-Track-root-line-pointer-v23_v26
in the below e-mail:
https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-06-05 06:53:29 Re: UPDATE of partition key
Previous Message Michael Paquier 2017-06-05 06:34:42 Re: Server ignores contents of SASLInitialResponse