Re: WIP: Covering + unique indexes.

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Subject: Re: WIP: Covering + unique indexes.
Date: 2018-04-06 15:49:28
Message-ID: CAPpHfdvUG9qCrhbEG5MeWHA7Dg+9KaRcoJZEraXe2w2XYvDBug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 6, 2018 at 5:00 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Apr 5, 2018 at 7:59 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> >> * btree_xlog_split() still has this code:
>
> > Right, I think there is absolutely no need in this code. It's removed in
> > the attached patchset.
>
> I'm now a bit nervous about always logging the high key, since that
> could impact performance. I think that there is a good way to only do
> it when needed. New plan:
>
> 1. Add these new fields to split record's set of xl_info fields (it
> should be placed directly after XLOG_BTREE_SPLIT_R):
>
> #define XLOG_BTREE_SPLIT_L_HIGHKEY 0x50 /* as above, include truncated
> highkey */
> #define XLOG_BTREE_SPLIT_R_HIGHKEY 0x60 /* as above, include truncated
> highkey */
>
> 2. Within _bt_split(), restore the old "leaf vs. internal" logic, so
> that the high key is only logged for internal (!isleaf) pages.
> However, only log it when needed for leaf pages -- only when the new
> highkey was *actually* truncated (or when its an internal page), since
> only then will it actually be different to the first item on the right
> page. Also, set XLOG_BTREE_SPLIT_L_HIGHKEY instead of
> XLOG_BTREE_SPLIT_L when we must log (or set XLOG_BTREE_SPLIT_R_HIGHKEY
> instead of XLOG_BTREE_SPLIT_R), so that recovery actually knows that
> it should restore the truncated highkey.
>
> (Sometimes I think it would be nice to be able to do more during
> recovery, but that's a much bigger issue.)
>
> 3. Restore all the master code within btree_xlog_split(), except
> instead of restoring the high key when !isleaf, do so when the record
> is XLOG_BTREE_SPLIT_L_HIGHKEY|XLOG_BTREE_SPLIT_R_HIGHKEY.
>
> 4. Add an assertion within btree_xlog_split(), that ensures that
> internal pages never fail to have their high key logged, since there
> is no reason why that should ever not happen with internal pages.
>
> 5. Fix this struct xl_btree_split comment, which commit 0c504a80 from
> 2017 missed when it reclaimed two xl_info status bits:
>
> * Note: the four XLOG_BTREE_SPLIT xl_info codes all use this data record.
> * The _L and _R variants indicate whether the inserted tuple went into the
> * left or right split page (and thus, whether newitemoff and the new item
> * are stored or not). The _ROOT variants indicate that we are splitting
> * the root page, and thus that a newroot record rather than an insert or
> * split record should follow. Note that a split record never carries a
> * metapage update --- we'll do that in the parent-level update.
>
> 6. Add your own xl_btree_split comment in its place, noting the new
> usage. Basically, the _ROOT sentence with a similar _HIGHKEY sentence.
>
> 7. Don't forget about btree_desc().
>
> I'd say that there is a good change that Anastasia is correct to think
> that it isn't worth worrying about the extra WAL that her approach
> implied, and that it is in fact good enough to simply always log the
> left page's high key. However, it seems easier and lower risk all
> around to do it this way. It doesn't leave us with ambiguity. In my
> experience, *ambiguity* on design questions makes a patch miss a
> release much more frequently than bugs or regressions make that
> happen.
>
> Sorry that I didn't just say this the first time I brought up
> btree_xlog_split(). I didn't see the opportunity to avoid creating
> more WAL until now.
>

Done. I would note that this aspect also catched my eye, but
since I didn't read very carefully the whole thread I thought that
it was already decided that small extra WAL is harmless.

> If you're talking about these assertions
> >
> > Assert(IndexRelationGetNumberOfAttributes(rel) != 0);
> > indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
> > Assert(indnkeyatts != 0);
>
> Actually, I was just talking about the first one,
> "Assert(IndexRelationGetNumberOfAttributes(rel) != 0)". I was unclear.
> Maybe it isn't worth getting rid of even the first one.
>

OK, "Assert(IndexRelationGetNumberOfAttributes(rel) != 0)" has been
removed.

> * Use <quote><quote/> here:
>
> > + <para>
> > + Indexes with columns listed in the <literal>INCLUDE</literal>
> clause
> > + are also called "covering indexes".
> > + </para>
>
> * Use <literal><literal/> here:
>
> > + <para>
> > + In <literal>UNIQUE</literal> indexes, uniqueness is only
> enforced
> > + for key columns. Columns listed in the
> <literal>INCLUDE</literal>
> > + clause have no effect on uniqueness enforcement. Other
> constraints
> > + (PRIMARY KEY and EXCLUDE) work the same way.
> > + </para>
>

Fixed, thanks.

* Do the regression tests pass with COPY_PARSE_PLAN_TREES?
>

I've checked. "make check-world" do pass with COPY_PARSE_PLAN_TREES.

* Running pgindent would be nice. I see a bit of trailing whitespace,
> and things like that.
>

I've run pgindent and added relevant changes into patch.

> * Please tweak the indentation here (perhaps a new line):
>
> > @@ -927,6 +963,7 @@ _bt_buildadd(BTWriteState *wstate, BTPageState
> *state, IndexTuple itup)
> > last_off = P_FIRSTKEY;
> > }
> >
> > + pageop = (BTPageOpaque) PageGetSpecialPointer(npage);
> > /*
>

Done.

> * Does the optimizer README's PathKeys section need a sentence or two
> on this patch?
>

It definitely needs one. I've added the sentence describing that non-key
attributes don't have corresponding PathKeys.

I'm nervous about problems within the optimizer in general, since that
> is an area that I am not particularly qualified to review. I hope that
> someone with more experience in that area can take a look at it
> specifically. I see that there are very few changes in the optimizer,
> but in my experience that's often the problem when it comes to the
> optimizer -- it lacks subtle things that it actually needs, rather
> than having the wrong things.
>

I would just note, that Postgres Pro has shipped a version of covering
indexes to customers. The optimizer part there is pretty same as now
in the patchset. Assuming that we didn't met optimizer issues in this
part, I expect them to be at least very rare. I also would like to ask my
Postgres Pro colleague Alexander Kuzmenkov to take a look at optimizer
part of this patch. If even after that we will miss something in the
optimizer then I expect it could be fixed after feature freeze.

* Does this existing build_index_pathkeys() comment need to be updated?
>
> * The result is canonical, meaning that redundant pathkeys are removed;
> * it may therefore have fewer entries than there are index columns.
> *
> * Another reason for stopping early is that we may be able to tell that
> * an index column's sort order is uninteresting for this query. However,
> * that test is just based on the existence of an EquivalenceClass and not
> * on position in pathkey lists, so it's not complete. Caller should call
> * truncate_useless_pathkeys() to possibly remove more pathkeys.
>

Yes, I've updated it.

> * I don't think that there is much point in having separate 0003 +
> 0004 patches. For the next revision, please squash those down into
> 0002. Actually, maybe there should be only one patch for the next
> revision. Up to you.
>

Agreed. I've merge all the patches into one.

> * Please write commit messages for your patches. I like to make these
> part of the review process.
>

Commit message is included into the patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Covering-v14.patch application/octet-stream 206.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2018-04-06 15:51:56 Re: json(b)_to_tsvector with numeric values
Previous Message Alvaro Herrera 2018-04-06 15:46:26 Re: [HACKERS] path toward faster partition pruning