Re: WIP: Covering + unique indexes.

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-01-13 01:27:37
Message-ID: CAKJS1f_GsQRQGYwUfBkL8H5ar4LW0sQ24cyZBdKAhFpG2jEsdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 January 2016 at 05:59, Anastasia Lubennikova <
a(dot)lubennikova(at)postgrespro(dot)ru> wrote:

> 08.01.2016 00:12, David Rowley:
>
> On 7 January 2016 at 06:36, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> On Tue, Jan 5, 2016 at 11:55 PM, David Rowley
>> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> > create table ab (a int,b int);
>> > insert into ab select x,y from generate_series(1,20) x(x),
>> > generate_series(10,1,-1) y(y);
>> > create index on ab (a) including (b);
>> > explain select * from ab order by a,b;
>> > QUERY PLAN
>> > ----------------------------------------------------------
>> > Sort (cost=10.64..11.14 rows=200 width=8)
>> > Sort Key: a, b
>> > -> Seq Scan on ab (cost=0.00..3.00 rows=200 width=8)
>> > (3 rows)
>>
>> If you set enable_sort=off, then you get the index-only scan with no
>> sort. So it believes the index can be used for ordering (correctly, I
>> think), just sometimes it thinks it is not faster to do it that way.
>>
>> I'm not sure why this would be a correctness problem. The covered
>> column does not participate in uniqueness checks, but it still usually
>> participates in index ordering. (That is why dummy op-classes are
>> needed if you want to include non-sortable-type columns as being
>> covered.)
>>
>
> If that's the case, then it appears that I've misunderstood INCLUDING.
> From reading _bt_doinsert() it appeared that it'll ignore the INCLUDING
> columns and just find the insert position based on the key columns. Yet
> that's not the way that it appears to work. I was also a bit confused, as
> from working with another database which has very similar syntax to this,
> that one only includes the columns to allow index only scans, and the
> included columns are not indexed, therefore can't be part of index quals
> and the index only provides a sorted path for the indexed columns, and not
> the included columns.
>
>
> Thank you for properly testing. Order by clause in this case definitely
> doesn't work as expected.
> The problem is fixed by patching a planner function
> "build_index_pathkeys()'. It disables using of index if sorting of included
> columns is required.
> Test example works correctly now - it always performs seq scan and sort.
>
>
Thank you for updating the patch.
That's cleared up my confusion. All the code I read seemed to indicate that
INCLUDING columns were leaf only, it just confused me as to why the indexed
appeared to search and order on all columns, including the including
columns. Thanks for clearing up my confusion and fixing the patch.

> Saying that, I'm now a bit confused to why the following does not produce
> 2 indexes which are the same size:
>
> create table t1 (a int, b text);
> insert into t1 select x,md5(random()::text) from
> generate_series(1,1000000) x(x);
> create index t1_a_inc_b_idx on t1 (a) including (b);
> create index t1_a_b_idx on t1 (a,b);
> select pg_relation_Size('t1_a_b_idx'),pg_relation_size('t1_a_inc_b_idx');
> pg_relation_size | pg_relation_size
> ------------------+------------------
> 59064320 | 58744832
> (1 row)
>
>
> I suppose you've already found that in discussion above. Included columns
> are stored only in leaf index pages. The difference is the size of
> attributes 'b' which are situated in inner pages of index "t1_a_b_idx".
>

Yeah, I saw that from the code too. I just was confused as they appeared to
work like normal indexes.

I've made another pass of the covering_unique_4.0.patch. Again somethings
are nit picky (sorry), but it made sense to write them down as I noticed
them.

- multiple entries with identical keys. An access method that supports
this
+ multiple entries with identical keys. An access method that supports
this

Space removed by mistake.

feature sets <structname>pg_am</>.<structfield>amcanunique</> true.
- (At present, only b-tree supports it.)
+ Columns included with clause INCLUDING aren't used to enforce
uniqueness.
+ (At present, only b-tree supports them.)

Maybe

+ (At present <structfield>amcanunique</> is only supported by b-tree
+ indexes.)

As the extra line you've added confuses what "it" or "them" means, so maybe
best to clarify that.

+ <literal>INCLUDING</literal> aren't used to enforce constraints
(UNIQUE, PRIMARY KEY, etc).

Goes beyond 80 chars.

right_item = CopyIndexTuple(item);
+ right_item = index_reform_tuple(rel, right_item, rel->rd_index->indnatts,
rel->rd_index->indnkeyatts);

Duplicate assignment. Should this perhaps be:

+ if (rel->rd_index->indnatts == rel->rd_index->indnkeyatts)
+ right_item = CopyIndexTuple(item);
+ else
+ right_item = index_reform_tuple(rel, right_item, rel->rd_index->indnatts,
rel->rd_index->indnkeyatts);

?

- natts = RelationGetNumberOfAttributes(rel);
- indoption = rel->rd_indoption;

- skey = (ScanKey) palloc(natts * sizeof(ScanKeyData));
+ Assert(rel->rd_index->indnkeyatts != 0);
+ Assert(rel->rd_index->indnkeyatts <= rel->rd_index->indnatts);

- for (i = 0; i < natts; i++)
+ nkeyatts = rel->rd_index->indnkeyatts;

Since RelationGetNumberOfAttributes() was previously used, maybe you should
do:

+ nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);

?

Yet I'm not really sure if there is some rule about when
RelationGetNumberOfAttributes(rel) is used and when rel->->rd_rel->relnatts
is used. It seems so mixed up.

accessMethodName = stmt->accessMethod;
+
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));

Unrelated change.

+#define Anum_pg_am_amcaninclude 15

Needs 1 more tab so that "15" lines up with the other numbers.

typedef struct IndexInfo
{
NodeTag type;
- int ii_NumIndexAttrs;
+ int ii_NumIndexAttrs; /* total number of columns in index */
+ int ii_NumIndexKeyAttrs; /* number of key columns in index */

The comment above this struct still needs a comment for "NumIndexKeyAttrs".
I'm not sure exactly why there's comments in both places with that struct,
but it makes sense to follow what's been done already.

+ * Returns the number of key attributes in a relation.

I think "relation" should be "index".

Here's a few things that I'm not too sure on, which maybe Jeff or others
could give their opinion on:

ERROR: duplicate key value violates unique constraint
"covering_index_index"
DETAIL: Key (f1, f2, f3 (included))=(1, 2, BBB) already exists.

Should we only display the key columns here? f3 feels like it does not
belong in any reports about unique violations.

+ if(list_intersection(stmt->indexParams, stmt->indexIncludingParams) !=
NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("included columns must not intersect with key columns")));
+

I wonder if a bit more effort should be spent here to generate a better
message. We do a bit more in cases like:

# create table a (a int, b int, c int, a int, b int);
ERROR: column "a" specified more than once

Perhaps it would be a good idea to also report the first matching intersect
item found. Any thoughts?

# create index on ab using hash (a) including (b);
WARNING: hash indexes are not WAL-logged and their use is discouraged
ERROR: access method "hash" does not support multicolumn indexes

I wonder if it's better to report: errmsg("access method \"%s\" does not
support included columns") before the multicolumn check? It probably does
not mater that much, but if a user thought (a) including (b) was a single
column index on "a", then it's a bit confusing.

I've also done some testing:

create table ab (a int, b int);
insert into ab select a,b from generate_Series(1,10) a(a),
generate_series(1,10000) b(b);
set enable_bitmapscan=off;
set enable_indexscan=off;

select * from ab where a = 1 and b=1;
a | b
---+---
1 | 1
(1 row)

set enable_indexscan = on;
select * from ab where a = 1 and b=1;
a | b
---+---
(0 rows)

This is broken. I've not looked into why yet, but from looking at the
EXPLAIN output I was a bit surprised to see b=1 as an index condition. I'd
have expected a Filter maybe, but I've not looked at the EXPLAIN code to
see how those are determined yet.

I've not looked at the other patch yet.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-01-13 01:47:40 Re: WIP: Covering + unique indexes.
Previous Message Dan Langille 2016-01-13 01:23:27 PGCon 2016 CFP - one week left