Re: WIP: Covering + unique indexes.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: Covering + unique indexes.
Date: 2018-04-06 02:00:41
Message-ID: CAH2-Wzmjb8RU9v5cjw0MKTXMbKoJFDBRkt6WR5MsZj4Mf6kTkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 5, 2018 at 7:59 AM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> * btree_xlog_split() still has this code:

> Right, I think there is absolutely no need in this code. It's removed in
> the attached patchset.

I'm now a bit nervous about always logging the high key, since that
could impact performance. I think that there is a good way to only do
it when needed. New plan:

1. Add these new fields to split record's set of xl_info fields (it
should be placed directly after XLOG_BTREE_SPLIT_R):

#define XLOG_BTREE_SPLIT_L_HIGHKEY 0x50 /* as above, include truncated
highkey */
#define XLOG_BTREE_SPLIT_R_HIGHKEY 0x60 /* as above, include truncated
highkey */

2. Within _bt_split(), restore the old "leaf vs. internal" logic, so
that the high key is only logged for internal (!isleaf) pages.
However, only log it when needed for leaf pages -- only when the new
highkey was *actually* truncated (or when its an internal page), since
only then will it actually be different to the first item on the right
page. Also, set XLOG_BTREE_SPLIT_L_HIGHKEY instead of
XLOG_BTREE_SPLIT_L when we must log (or set XLOG_BTREE_SPLIT_R_HIGHKEY
instead of XLOG_BTREE_SPLIT_R), so that recovery actually knows that
it should restore the truncated highkey.

(Sometimes I think it would be nice to be able to do more during
recovery, but that's a much bigger issue.)

3. Restore all the master code within btree_xlog_split(), except
instead of restoring the high key when !isleaf, do so when the record
is XLOG_BTREE_SPLIT_L_HIGHKEY|XLOG_BTREE_SPLIT_R_HIGHKEY.

4. Add an assertion within btree_xlog_split(), that ensures that
internal pages never fail to have their high key logged, since there
is no reason why that should ever not happen with internal pages.

5. Fix this struct xl_btree_split comment, which commit 0c504a80 from
2017 missed when it reclaimed two xl_info status bits:

* Note: the four XLOG_BTREE_SPLIT xl_info codes all use this data record.
* The _L and _R variants indicate whether the inserted tuple went into the
* left or right split page (and thus, whether newitemoff and the new item
* are stored or not). The _ROOT variants indicate that we are splitting
* the root page, and thus that a newroot record rather than an insert or
* split record should follow. Note that a split record never carries a
* metapage update --- we'll do that in the parent-level update.

6. Add your own xl_btree_split comment in its place, noting the new
usage. Basically, the _ROOT sentence with a similar _HIGHKEY sentence.

7. Don't forget about btree_desc().

I'd say that there is a good change that Anastasia is correct to think
that it isn't worth worrying about the extra WAL that her approach
implied, and that it is in fact good enough to simply always log the
left page's high key. However, it seems easier and lower risk all
around to do it this way. It doesn't leave us with ambiguity. In my
experience, *ambiguity* on design questions makes a patch miss a
release much more frequently than bugs or regressions make that
happen.

Sorry that I didn't just say this the first time I brought up
btree_xlog_split(). I didn't see the opportunity to avoid creating
more WAL until now.

> OK, I've added asserting that number of tuple attributes shoud be
> either natts or nkeyatts, because we call _bt_mkscankey() for
> pivot index tuples too.

Makes sense.

> If you're talking about these assertions
>
> Assert(IndexRelationGetNumberOfAttributes(rel) != 0);
> indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
> Assert(indnkeyatts != 0);

Actually, I was just talking about the first one,
"Assert(IndexRelationGetNumberOfAttributes(rel) != 0)". I was unclear.
Maybe it isn't worth getting rid of even the first one.

> then I would rather leave both them. If we know that index tuple
> length is either natts or nkeyatts, that doesn't make you sure, that
> both natts and nkeyatts are non-zero.

I suppose.

> I've done so. Tests are passing for me.

Great. I'm glad that worked out. One simple, broad rule.

> Hmm, we have four bit reserved. But I'm not sure whether we would use
> *all* of them for non-pivot tuples. Probably we would use some of them for
> pivot tuples. I don't know that in advance. Thus, I propose to don't
> rename this. But I've added comment that non-pivot tuples might also
> use those bits.

Okay. Good enough.

> Sorry, I didn't get which particular further use of reserved bits do you
> mean?
> Did you mean key normalization?

I was being unclear. I was just reiterating my point about having a
non-pivot bit. It doesn't matter, though.

> Thank you for writing that explanation. Looks good.

I think that once you realize how INCLUDE indexes don't change pivot
tuples, and actually understand what pivot tuples are, the patch seems
a lot less scary.

> This patchset also incorporates docs enhacements by Erik Rijkers and
> sentence which states that indexes with included colums are also called
> "covering indexes".

Cool.

* Use <quote><quote/> here:

> + <para>
> + Indexes with columns listed in the <literal>INCLUDE</literal> clause
> + are also called "covering indexes".
> + </para>

* Use <literal><literal/> here:

> + <para>
> + In <literal>UNIQUE</literal> indexes, uniqueness is only enforced
> + for key columns. Columns listed in the <literal>INCLUDE</literal>
> + clause have no effect on uniqueness enforcement. Other constraints
> + (PRIMARY KEY and EXCLUDE) work the same way.
> + </para>

* Do the regression tests pass with COPY_PARSE_PLAN_TREES?

* Running pgindent would be nice. I see a bit of trailing whitespace,
and things like that.

* Please tweak the indentation here (perhaps a new line):

> @@ -927,6 +963,7 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
> last_off = P_FIRSTKEY;
> }
>
> + pageop = (BTPageOpaque) PageGetSpecialPointer(npage);
> /*

* Does the optimizer README's PathKeys section need a sentence or two
on this patch?

I'm nervous about problems within the optimizer in general, since that
is an area that I am not particularly qualified to review. I hope that
someone with more experience in that area can take a look at it
specifically. I see that there are very few changes in the optimizer,
but in my experience that's often the problem when it comes to the
optimizer -- it lacks subtle things that it actually needs, rather
than having the wrong things.

* Does this existing build_index_pathkeys() comment need to be updated?

* The result is canonical, meaning that redundant pathkeys are removed;
* it may therefore have fewer entries than there are index columns.
*
* Another reason for stopping early is that we may be able to tell that
* an index column's sort order is uninteresting for this query. However,
* that test is just based on the existence of an EquivalenceClass and not
* on position in pathkey lists, so it's not complete. Caller should call
* truncate_useless_pathkeys() to possibly remove more pathkeys.

* I don't think that there is much point in having separate 0003 +
0004 patches. For the next revision, please squash those down into
0002. Actually, maybe there should be only one patch for the next
revision. Up to you.

* Please write commit messages for your patches. I like to make these
part of the review process.

That's all for now.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-06 02:03:42 Re: [HACKERS] path toward faster partition pruning
Previous Message Masahiko Sawada 2018-04-06 01:52:58 Re: [HACKERS] GUC for cleanup indexes threshold.