Re: WIP: Covering + unique indexes.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-02-11 16:46:24
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

02.02.2016 15:50, Anastasia Lubennikova:
> 31.01.2016 11:04, David Rowley:
>> On 27 January 2016 at 03:35, Anastasia Lubennikova
>> <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>>> including_columns_3.0 is the latest version of patch.
>>> And changes regarding the previous version are attached in a
>>> separate patch.
>>> Just to ease the review and debug.
>> Hi,
>> I've made another pass over the patch. There's still a couple of
>> things that I think need to be looked at.
> Thank you again.
> I just write here to say that I do not disappear and I do remember
> about the issue.
> But I'm very very busy this week. I'll send an updated patch next week
> as soon as possible.

As promised, here's the new version of the patch "including_columns_4.0".
I fixed all issues except some points mentioned below.
Besides, I did some refactoring:
- use macros IndexRelationGetNumberOfAttributes,
IndexRelationGetNumberOfKeyAttributes where possible. Use macro
RelationGetNumberOfAttributes. Maybe that's a bit unrelated changes, but
it'll make development much easier in future.
- rename related variables to indnatts, indnkeyatts.
>> I'm also not that keen on index_reform_tuple() in general. I wonder if
>> there's a way we can just keep the Datum/isnull arrays a bit longer,
>> and only form the tuple when needed. I've not looked into this in
>> detail, but it does look like reforming the tuple is not going to be
>> cheap.
> It is used in splits, for example. There is no datum array, we just
> move tuple key from a child page to a parent page or something like that.
> And according to INCLUDING algorithm we need to truncate nonkey
> attributes.
>> If we do need to keep this function, I think a better name might be
>> index_trim_tuple() and I don't think you need to pass the original
>> length. It might make sense to Assert() that the trim length is
>> smaller than the tuple size
> As regards the performance, I don't think that it's a big problem here.
> Do you suggest to do it in a following way memcpy(oldtup, newtup,
> newtuplength)?

I've tested it some more, and still didn't find any performance issues.

>> in gistrescan() there is some code:
>> for (attno = 1; attno <= natts; attno++)
>> {
>> TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
>> scan->indexRelation->rd_opcintype[attno - 1],
>> -1, 0);
>> }
>> Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
>> be sized by the number of key columns, but this loop goes over the
>> number of attribute columns.
>> Perhaps this is not a big problem since GIST does not support
>> INCLUDING columns, but it does seem wrong still.
> GiST doesn't support INCLUDING clause, so natts and nkeyatts are
> always equal. I don't see any problem here.
> And I think that it's an extra work to this patch. Maybe I or someone
> else would add this feature to other access methods later.

Still the same.
>> Which brings me to the fact that I've spent a bit of time trying to
>> look for places where you've forgotten to change natts to nkeyatts. I
>> did find this one, but I don't have much confidence that there's not
>> lots more places that have been forgotten. Apart from this one, how
>> confident are you that you've found all the places? I'm getting
>> towards being happy with the code that I see that's been changed, but
>> I'm hesitant to mark as "Ready for committer" due to not being all
>> that comfortable that all the code that needs to be updated has been
>> updated. I'm not quite sure of a good way to find all these places.
> I found all mentions of natts and other related variables with grep,
> and replaced (or expand) them with nkeyatts where it was necessary.
> As mentioned before, I didn't change other AMs.
> I strongly agree that any changes related to btree require thorough
> inspection, so I'll recheck it again. But I'm almost sure that it's okay.
I rechecked everything again and fixed couple of omissions. Thank you
for being exacting reviewer)
I don't know how to ensure that everything is ok, but I have no idea
what else I can do.

>> I wondering if hacking the code so that each btree index which is
>> created with > 1 column puts all but the first column into the
>> INCLUDING columns, then run the regression tests to see if there are
>> any crashes. I'm really not that sure of how else to increase the
>> confidence levels on this. Do you have ideas?
> Do I understand correctly that you suggest to replace all multicolumn
> indexes with (1key column) + included?
> I don't think it's a good idea. INCLUDING clause brings some
> disadvantages. For example, included columns must be filtered after
> the search, while key columns could be used in scan key directly. I
> already mentioned this in test example:
> explain analyze select c1, c2 from tbl where c1<10000 and c3<20;
> If columns' opclasses are used, new query plan uses them in Index
> Cond: ((c1 < 10000) AND (c3 < 20))
> Otherwise, new query can not use included column in Index Cond and
> uses filter instead:
> Index Cond: (c1 < 10000)
> Filter: (c3 < 20)
> Rows Removed by Filter: 9993
> It slows down the query significantly.
> And besides that, we still want to have multicolumn unique indexes.
> CREATE UNIQUE INDEX on tbl (a, b, c) INCLUDING (d);

I started a new thread about related refactoring, because I think that
it should be a separate patch.

Anastasia Lubennikova
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
including_columns_4.0.patch text/x-patch 79.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-02-11 17:04:39 Re: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.
Previous Message Joe Conway 2016-02-11 16:18:47 Re: max_parallel_degree context level