Re: WIP: Covering + unique indexes.

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-04-05 20:31:50
Message-ID: CAM3SWZRheiMRFWmZ39X=G=w0WgZq5qqiyFAzCMg4jBpMPMsTJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2016 at 7:56 AM, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> I would say, this is changed, but it doesn't matter.

Actually, I would now say that it hasn't really changed (see below),
based on my new understanding. The *choice* to go on one page or the
other still exists.

> Performing any search in btree (including choosing suitable page for
> insertion), we use only key attributes.
> We assume that included columns are stored in index unordered.

The patch assumes no ordering for the non-indexed columns in the
index? While I knew that the patch was primarily motivated by enabling
index-only scans, I didn't realize that at all. The patch is much much
less like a general suffix truncation patch than I thought. I may have
been confused in part by the high key issue that you only recently
fixed, but you should have corrected me about suffix truncation
earlier. Obviously, this was a significant misunderstanding; we have
been "talking at cross purposes" this whole time.

There seems to have been significant misunderstanding about this before now:

http://www.postgresql.org/message-id/CAKJS1f9W0aB-g7H6yYgNBq7hJsOKF3UwHU7-Q5jobbaTyK9f4g@mail.gmail.com

My new understanding: The extra "included" columns are stored in the
index, but do not affect its sort order at all. They are no more part
of the key than, say, the heap TID that the key points to. They are
just "payload".

> "just matching on constrained attributes" is the core idea of the whole
> patch. Included columns just provide us possibility to use index-only scan.
> Nothing more. We assume use case where index-only-scan is faster than
> index-scan + heap fetch. For example, in queries like "select data from tbl
> where id = 1;" we have no scan condition on data. Maybe you afraid of long
> linear scan when we have enormous index bloat even on unique index. It will
> happen anyway, whether we have index-only scan on covering index or
> index-scan on unique index + heap fetch. The only difference is that the
> covering index is faster.

My concern about performance when that happens is very much secondary.
I really only mentioned it to help explain my primary concern.

> At the very beginning of the proposal discussion, I suggested to implement
> third kind of columns, which are not constrained, but used in scankey.
> They must have op class to do it, and they are not truncated. But it was
> decided to abandon this feature.

I must have missed that. Obviously, I wasn't paying enough attention
to earlier discussion. Earlier versions of the patch did fail to
recognize that the sort order was not the entire indexed order, but
that isn't the case with V8. That that was ever possible was only a
bug, it turns out.

>> The more I think about it, the more I doubt that it's okay to not
>> ensure downlinks are always distinct with their siblings, by sometimes
>> including non-constrained (truncatable) attributes within internal
>> pages, as needed to *distinguish* downlinks (also, we must
>> occasionally have *all* attributes including truncatable attributes in
>> internal pages -- we must truncate nothing to keep the key space sane
>> in the parent).

> Frankly, I still do not understand what you're worried about.
> If high key is greater than the scan key, we definitely cannot find any more
> tuples, because key attributes are ordered.
> If high key is equal to the scan key, we will continue searching and read
> next page.

I thought, because of the emphasis on unique indexes, that this patch
was mostly to offer a way of getting an index with uniqueness only
enforced on certain columns, but otherwise just the same as having a
non-unique index on those same columns. Plus, some suffix truncation,
because point-lookups involving later attributes are unlikely to be
useful when this is scoped to just unique indexes (which were
emphasized by you), because truncating key columns is not helpful
unless bloat is terrible.

I now understand that it was quite wrong to link this to suffix
truncation at all. The two are really not the same. That does make the
patch seem significantly simpler, at least as far as nbtree goes; a
tool like amcheck is not likely to detect problems in this patch that
a human tester could not catch. That was the kind of problem that I
feared.

> The code is not changed here, it is the same as processing of duplicates
> spreading over several pages. If you do not trust postgresql btree changes
> to the L&Y to make duplicates work, I don't know what to say, but it's
> definitely not related to my patch.

My point about the postgres btree changes to L&Y to make duplicates
work is that I think it makes the patch work, but perhaps not
absolutely reliably. I don't have any specific misgivings about it on
its own. Again, my earlier remarks were based on a misguided
understanding of the patch, so it doesn't matter now.

Communication is hard. There may be a lesson here for both of us about that.

> Of course I do not mind if someone will do more testing.
> I did some tests and didn't find anything special. Besides, don't we have
> special alpha and beta release stages to find tricky bugs?

Our history of committing performance improvements to the B-Tree code
is limited, particularly in the last 5 years. That's definitely a
problem, and one that I have tried to make smaller, but it is the
reality.

BTW, I can see why you used index_reform_tuple(), rather than trying
to modify an existing tuple in place. NULL bitmaps have a storage
overhead in IndexTuples (presumably an alternative approach would make
truncated IndexTuples have NULL attributes to represent truncation),
whereas the cost of index_reform_tuple() only has to be paid when
there is a leaf page split. It's important that truncation is 100%
guaranteed to produce a tuple smaller than the inserted tuple,
otherwise the user could get a non-recoverable "1/3 of page size
exceeded" when they were not the one to insert the big IndexTuple. I
should try to see if this could be possible due to some
index_reform_tuple() edge-case.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-04-05 20:32:00 Re: Yet another small patch - reorderbuffer.c:1099
Previous Message Robbie Harwood 2016-04-05 20:12:51 Re: [PATCH v12] GSSAPI encryption support