Re: WIP: Covering + unique indexes.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-04-06 13:15:42
Message-ID: 57050BFE.7020805@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

06.04.2016 03:05, Peter Geoghegan:
> On Tue, Apr 5, 2016 at 1:31 PM, Peter Geoghegan<pg(at)heroku(dot)com> wrote:
>> My new understanding: The extra "included" columns are stored in the
>> index, but do not affect its sort order at all. They are no more part
>> of the key than, say, the heap TID that the key points to. They are
>> just "payload".

It was really long and complicated discussion. I'm glad that finally we
are in agreement about the patch.
Anyway, I think all mentioned questions will be very helpful for the
future work on b-tree.

> Noticed a few issues following another pass:
>
> * tuplesort.c should handle the CLUSTER case in the same way as the
> btree case. No?
Yes, I just missed that cluster uses index sort. Fixed.

> * Why have a RelationGetNumberOfAttributes(indexRel) call in
> tuplesort_begin_index_btree() at all now?
Fixed.
> * This critical section is unnecessary, because this happens during
> index builds:
>
> + if (indnkeyatts != indnatts && P_ISLEAF(opageop))
> + {
> + /*
> + * It's essential to truncate High key here.
> + * The purpose is not just to save more space
> on this particular page,
> + * but to keep whole b-tree structure
> consistent. Subsequent insertions
> + * assume that hikey is already truncated, and
> so they should not
> + * worry about it, when copying the high key
> into the parent page
> + * as a downlink.
> + * NOTE It is not crutial for reliability in present,
> + * but maybe it will be that in the future.
> + * NOTE this code will be changed by the
> "btree compression" patch,
> + * which is in progress now.
> + */
> + keytup = index_reform_tuple(wstate->index, oitup,
> +
> indnatts, indnkeyatts);
> +
> + /* delete "wrong" high key, insert keytup as
> P_HIKEY. */
> + START_CRIT_SECTION();
> + PageIndexTupleDelete(opage, P_HIKEY);
> +
> + if (!_bt_pgaddtup(opage,
> IndexTupleSize(keytup), keytup, P_HIKEY))
> + elog(ERROR, "failed to rewrite
> compressed item in index \"%s\"",
> + RelationGetRelationName(wstate->index));
> + END_CRIT_SECTION();
> + }
>
> Note that START_CRIT_SECTION() promotes any ERROR to PANIC, which
> isn't useful here, because we have no buffer lock held, and nothing
> must be WAL-logged.
>
> * Think you forgot to update spghandler(). (You did not add a test for
> just that one AM, either)
Fixed.
> * I wonder why this restriction needs to exist:
>
> + else
> + elog(ERROR, "Expressions are not supported in
> included columns.");
>
> What does not supporting it buy us? Was it just that the pg_index
> representation is more complicated, and you wanted to put it off?
>
> An error like this should use ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED ..., btw.
Yes, you get it right. It was a bit complicated to implement and I
decided to delay it to the next patch.
errmsg is fixed.

> * I would like to see index_reform_tuple() assert that the new,
> truncated index tuple is definitely <= the original (I worry about the
> 1/3 page restriction issue). Maybe you should also change the name of
> index_reform_tuple(), per David.
Is it possible that the new tuple, containing less attributes than the
old one, will have a greater size?
Maybe you can give an example?
I think that Assert(indnkeyatts <= indnatts); covers this kind of errors.
I do not mind to rename this function, but what name would be better?
index_truncate_tuple()?

> * There is some stray whitespace within RelationGetIndexAttrBitmap().
> I think you should have updated it with code, though. I don't think
> it's necessary for HOT updates to work, but I think it could be
> necessary so that we don't need to get a row lock that blocks
> non-conflict foreign key locking (see heap_update() callers). I think
> you need to be careful for non-key columns within the loop in
> RelationGetIndexAttrBitmap(), basically, because it seems to still go
> through all columns. UPSERT also must call this code, FWIW.
>
> * I think that a similar omission is also made for the replica
> identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
> on how this patch interacts with logical decoding, I guess.

Good point. Indexes are everywhere in the code.
I missed that RelationGetIndexAttrBitmap() is used not only for REINDEX.
I'll discuss it with Theodor and send an updated patch tomorrow.

> * Valgrind shows an error with an aggregate statement I tried:
>
> 2016-04-05 17:01:31.129 PDT 12310 LOG: statement: explain analyze
> select count(*) from ab where b > 5 group by a, b;
> ==12310== Invalid read of size 4
> ==12310== at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
> ==12310== by 0x656615: match_clause_to_index (indxpath.c:2144)
> ==12310== by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
> ==12310== by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
> ==12310== by 0x658054: create_index_paths (indxpath.c:269)
> ==12310== by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
> ==12310== by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
> ==12310== by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
> ==12310== by 0x64D93B: make_one_rel (allpaths.c:170)
> ==12310== by 0x66876C: query_planner (planmain.c:246)
> ==12310== by 0x669FBA: grouping_planner (planner.c:1666)
> ==12310== by 0x66D0C9: subquery_planner (planner.c:751)
> ==12310== by 0x66D3DA: standard_planner (planner.c:300)
> ==12310== by 0x66D714: planner (planner.c:170)
> ==12310== by 0x6FD692: pg_plan_query (postgres.c:798)
> ==12310== by 0x59082D: ExplainOneQuery (explain.c:350)
> ==12310== Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
> ==12310== at 0x4C2AB80: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==12310== by 0x81B7FA: AllocSetAlloc (aset.c:853)
> ==12310== by 0x81D257: palloc (mcxt.c:907)
> ==12310== by 0x4B6F65: RelationGetIndexScan (genam.c:94)
> ==12310== by 0x4C135D: btbeginscan (nbtree.c:431)
> ==12310== by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
> ==12310== by 0x4B7C5A: index_beginscan (indexam.c:222)
> ==12310== by 0x4B73D1: systable_beginscan (genam.c:379)
> ==12310== by 0x7E8CF9: ScanPgRelation (relcache.c:341)
> ==12310== by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
> ==12310== by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
> ==12310== by 0x4A4D37: relation_open (heapam.c:1118)
> ==12310==
> {
> <insert_a_suppression_name_here>
> Memcheck:Addr4
> fun:match_clause_to_indexcol
> fun:match_clause_to_index
> fun:match_clauses_to_index
> fun:match_restriction_clauses_to_index
> fun:create_index_paths
> fun:set_plain_rel_pathlist
> fun:set_rel_pathlist
> fun:set_base_rel_pathlists
> fun:make_one_rel
> fun:query_planner
> fun:grouping_planner
> fun:subquery_planner
> fun:standard_planner
> fun:planner
> fun:pg_plan_query
> fun:ExplainOneQuery
> }
>
> Separately, I tried "make installcheck-tests TESTS=index_including"
> from Postgres + Valgrind, with Valgrind's --track-origins option
> enabled (as it was above). I recommend installing Valgrind, and making
> sure that the patch shows no errors. I didn't actually find a Valgrind
> issue from just using your regression tests (nor did I find an issue
> from separately running the regression tests with
> CLOBBER_CACHE_ALWAYS, FWIW).
>
Thank you for advice.
Another miss of index->ncolumns to index->nkeycolumns replacement in
match_clause_to_index. Fixed.
I also updated couple of typos in documentation.

Thank you again for the detailed review.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
including_columns_9.0.patch text/x-patch 132.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-06 13:19:29 Re: [HACKERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server
Previous Message Craig Ringer 2016-04-06 13:15:38 Re: WIP: Failover Slots