Re: WIP: Covering + unique indexes.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Subject: Re: WIP: Covering + unique indexes.
Date: 2017-04-02 23:40:08
Message-ID: CAH2-WznLFzi49_mvAQOFjdpTCq3jZCXSTDMTVg58tUUqTps5Wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 4:31 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> That's all I have for now. Maybe I can look again later, or tomorrow.

I took another look, this time at code used during CREATE INDEX. More feedback:

* I see no reason to expose _bt_pgaddtup() (to modify it to not be
static, so it can be called during CREATE INDEX for truncated high
key). You could call PageAddItem() directly, just as _bt_pgaddtup()
itself does, and lose nothing. This is the case because the special
steps within _bt_pgaddtup() are only when inserting the first real
item (and only on an internal page). You're only ever using
_bt_pgaddtup() for the high key offset. Would a raw PageAddItem() call
lose anything?

I think I see why you've done this -- the existing CREATE INDEX
_bt_sortaddtup() routine (which is very similar to _bt_pgaddtup(), a
routine used for *insertion*) doesn't do the correct thing were you to
use it, because it assumes that the page is always right most (i.e.,
always has no high key yet).

The reason _bt_sortaddtup() exists is explained here:

* This is almost like nbtinsert.c's _bt_pgaddtup(), but we can't use
* that because it assumes that P_RIGHTMOST() will return the correct
* answer for the page. Here, we don't know yet if the page will be
* rightmost. Offset P_FIRSTKEY is always the first data key.
*/
static void
_bt_sortaddtup(Page page,
Size itemsize,
IndexTuple itup,
OffsetNumber itup_off)
{
...
}

(...thinks some more...)

So, this difference only matters when you have a non-leaf item, which
is never subject to truncation in your patch. So, in fact, it doesn't
matter at all. I guess you should just use _bt_pgaddtup() after all,
rather than bothering with a raw PageAddItem(), even. But, don't
forget to note why this is okay above _bt_sortaddtup().

* Calling PageIndexTupleDelete() within _bt_buildadd(), which
memmove()s all other items on the leaf page, seems wasteful in the
context of CREATE INDEX. Can we do better?

* I also think that calling PageIndexTupleDelete() has a page space
accounting bug, because the following thing happens a second time for
highkey ItemId when new code does this call:

phdr->pd_lower -= sizeof(ItemIdData);

(The first time this happens is within _bt_buildadd() itself, just
before your patch calls PageIndexTupleDelete().)

* I don't think it's okay to let index_truncate_tuple() leak memory
within _bt_buildadd(). It's probably okay for nbtinsert.c callers to
index_truncate_tuple() to not be too careful, though, since those
calls occur in a per-tuple memory context. The same cannot be said for
_bt_buildadd()/CREATE INDEX calls.

* Speaking of memory management: is this really needed?

> @@ -554,7 +580,11 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
> * Save a copy of the minimum key for the new page. We have to copy
> * it off the old page, not the new one, in case we are not at leaf
> * level.
> + * Despite oitup is already initialized, it's important to get high
> + * key from the page, since we could have replaced it with truncated
> + * copy. See comment above.
> */
> + oitup = (IndexTuple) PageGetItem(opage,PageGetItemId(opage, P_HIKEY));
> state->btps_minkey = CopyIndexTuple(oitup);

You didn't modify/truncate oitup in-place -- you effectively made a
(truncated) copy by calling index_truncate_tuple(). Maybe you can
manage the memory by assigning keytup to state->btps_minkey, in place
of a CopyIndexTuple(), just for the truncation case?

I haven't studied this in enough detail to be sure that that would be
correct, but it seems clear that a better strategy is needed for
managing memory within _bt_buildadd().

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-04-02 23:44:40 Re: WIP: Covering + unique indexes.
Previous Message Michael Paquier 2017-04-02 23:38:47 Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?