Re: WIP: Covering + unique indexes.

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: WIP: Covering + unique indexes.
Date: 2016-04-06 00:05:05
Message-ID: CAM3SWZQLLYajNy-5K702RTWMeRkUEEBtYJbb+ifYRQOCHNx6=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2016 at 1:31 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> My new understanding: The extra "included" columns are stored in the
> index, but do not affect its sort order at all. They are no more part
> of the key than, say, the heap TID that the key points to. They are
> just "payload".

Noticed a few issues following another pass:

* tuplesort.c should handle the CLUSTER case in the same way as the
btree case. No?

* Why have a RelationGetNumberOfAttributes(indexRel) call in
tuplesort_begin_index_btree() at all now?

* This critical section is unnecessary, because this happens during
index builds:

+ if (indnkeyatts != indnatts && P_ISLEAF(opageop))
+ {
+ /*
+ * It's essential to truncate High key here.
+ * The purpose is not just to save more space
on this particular page,
+ * but to keep whole b-tree structure
consistent. Subsequent insertions
+ * assume that hikey is already truncated, and
so they should not
+ * worry about it, when copying the high key
into the parent page
+ * as a downlink.
+ * NOTE It is not crutial for reliability in present,
+ * but maybe it will be that in the future.
+ * NOTE this code will be changed by the
"btree compression" patch,
+ * which is in progress now.
+ */
+ keytup = index_reform_tuple(wstate->index, oitup,
+
indnatts, indnkeyatts);
+
+ /* delete "wrong" high key, insert keytup as
P_HIKEY. */
+ START_CRIT_SECTION();
+ PageIndexTupleDelete(opage, P_HIKEY);
+
+ if (!_bt_pgaddtup(opage,
IndexTupleSize(keytup), keytup, P_HIKEY))
+ elog(ERROR, "failed to rewrite
compressed item in index \"%s\"",
+ RelationGetRelationName(wstate->index));
+ END_CRIT_SECTION();
+ }

Note that START_CRIT_SECTION() promotes any ERROR to PANIC, which
isn't useful here, because we have no buffer lock held, and nothing
must be WAL-logged.

* Think you forgot to update spghandler(). (You did not add a test for
just that one AM, either)

* I wonder why this restriction needs to exist:

+ else
+ elog(ERROR, "Expressions are not supported in
included columns.");

What does not supporting it buy us? Was it just that the pg_index
representation is more complicated, and you wanted to put it off?

An error like this should use ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED ..., btw.

* I would like to see index_reform_tuple() assert that the new,
truncated index tuple is definitely <= the original (I worry about the
1/3 page restriction issue). Maybe you should also change the name of
index_reform_tuple(), per David.

* There is some stray whitespace within RelationGetIndexAttrBitmap().
I think you should have updated it with code, though. I don't think
it's necessary for HOT updates to work, but I think it could be
necessary so that we don't need to get a row lock that blocks
non-conflict foreign key locking (see heap_update() callers). I think
you need to be careful for non-key columns within the loop in
RelationGetIndexAttrBitmap(), basically, because it seems to still go
through all columns. UPSERT also must call this code, FWIW.

* I think that a similar omission is also made for the replica
identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
on how this patch interacts with logical decoding, I guess.

* Valgrind shows an error with an aggregate statement I tried:

2016-04-05 17:01:31.129 PDT 12310 LOG: statement: explain analyze
select count(*) from ab where b > 5 group by a, b;
==12310== Invalid read of size 4
==12310== at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
==12310== by 0x656615: match_clause_to_index (indxpath.c:2144)
==12310== by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
==12310== by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
==12310== by 0x658054: create_index_paths (indxpath.c:269)
==12310== by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
==12310== by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
==12310== by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
==12310== by 0x64D93B: make_one_rel (allpaths.c:170)
==12310== by 0x66876C: query_planner (planmain.c:246)
==12310== by 0x669FBA: grouping_planner (planner.c:1666)
==12310== by 0x66D0C9: subquery_planner (planner.c:751)
==12310== by 0x66D3DA: standard_planner (planner.c:300)
==12310== by 0x66D714: planner (planner.c:170)
==12310== by 0x6FD692: pg_plan_query (postgres.c:798)
==12310== by 0x59082D: ExplainOneQuery (explain.c:350)
==12310== Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
==12310== at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12310== by 0x81B7FA: AllocSetAlloc (aset.c:853)
==12310== by 0x81D257: palloc (mcxt.c:907)
==12310== by 0x4B6F65: RelationGetIndexScan (genam.c:94)
==12310== by 0x4C135D: btbeginscan (nbtree.c:431)
==12310== by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
==12310== by 0x4B7C5A: index_beginscan (indexam.c:222)
==12310== by 0x4B73D1: systable_beginscan (genam.c:379)
==12310== by 0x7E8CF9: ScanPgRelation (relcache.c:341)
==12310== by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
==12310== by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
==12310== by 0x4A4D37: relation_open (heapam.c:1118)
==12310==
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:match_clause_to_indexcol
fun:match_clause_to_index
fun:match_clauses_to_index
fun:match_restriction_clauses_to_index
fun:create_index_paths
fun:set_plain_rel_pathlist
fun:set_rel_pathlist
fun:set_base_rel_pathlists
fun:make_one_rel
fun:query_planner
fun:grouping_planner
fun:subquery_planner
fun:standard_planner
fun:planner
fun:pg_plan_query
fun:ExplainOneQuery
}

Separately, I tried "make installcheck-tests TESTS=index_including"
from Postgres + Valgrind, with Valgrind's --track-origins option
enabled (as it was above). I recommend installing Valgrind, and making
sure that the patch shows no errors. I didn't actually find a Valgrind
issue from just using your regression tests (nor did I find an issue
from separately running the regression tests with
CLOBBER_CACHE_ALWAYS, FWIW).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-04-06 00:12:35 Re: Timeline following for logical slots
Previous Message David Fetter 2016-04-05 23:48:25 Re: Materialized views vs. primary keys