Re: Index Skip Scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-11-02 18:56:08
Message-ID: CAH2-WzkG0KRHjN_atViTcAC0-Yze5Mv0rfj+mPYhL8oLm0cnfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 25, 2019 at 2:33 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> v27-0001-Index-skip-scan.patch

Some random thoughts on this:

* Is _bt_scankey_within_page() really doing the right thing within empty pages?

It looks like you're accidentally using the high key when the leaf
page is empty with forward scans (assuming that the leaf page isn't
rightmost). You'll need to think about empty pages for both forward
and backward direction scans there.

Actually, using the high key in some cases may be desirable, once the
details are worked out -- the high key is actually very helpful with
low cardinality indexes. If you populate an index with retail
insertions (i.e. don't just do a CREATE INDEX after the table is
populated), and use low cardinality data in the indexed columns then
you'll see this effect. You can have a few hundred index entries for
each distinct value, and the page split logic added to Postgres 12 (by
commit fab25024) will still manage to "trap" each set of duplicates on
their own distinct leaf page. Leaf pages will have a high key that
looks like the values that appear on the page to the right. The goal
is for forward index scans to access the minimum number of leaf pages,
especially with low cardinality data and with multi-column indexes.
(See also: commit 29b64d1d)

A good way to see this for yourself is to get the Wisconsin Benchmark
tables (the tenk1 table and related tables from the regression tests)
populated using retail insertions. "CREATE TABLE __tenk1(like tenk1
including indexes); INSERT INTO __tenk1 SELECT * FROM tenk1;" is how I
like to set this up. Then you can see that we only access one leaf
page easily by forcing bitmap scans (i.e. "set enable* ..."), and
using "EXPLAIN (analyze, buffers) SELECT ... FROM __tenk1 WHERE ...",
where the SELECT query is a simple point lookup query (bitmap scans
happen to instrument the index buffer accesses in a way that makes it
absolutely clear how many index page buffers were touched). IIRC the
existing tenk1 indexes have no more than a few hundred duplicates for
each distinct value in all cases, so only one leaf page needs to be
accessed by simple "key = val" queries in all cases.

(I imagine that the "four" index you add in the regression test would
generally need to visit more than one leaf page for simple point
lookup queries, but in any case the high key is a useful way of
detecting a "break" in the values when indexing low cardinality data
-- these breaks are generally "aligned" to leaf page boundaries.)

I also like to visualize the keyspace of indexes when poking around at
that stuff, generally by using some of the queries that you can find
on the Wiki [1].

* The extra scankeys that you are using in most of the new nbtsearch.c
code is an insertion scankey -- not a search style scankey. I think
that you should try to be a bit clearer on that distinction in
comments. It is already confusing now, but at least there is only zero
or one insertion scankeys per scan (for the initial positioning).

* There are two _bt_skip() prototypes in nbtree.h (actually, there is
a btskip() and a _bt_skip()). I understand that the former is a public
wrapper of the latter, but I find the naming a little bit confusing.
Maybe rename _bt_skip() to something that is a little bit more
suggestive of its purpose.

* Suggest running pgindent on the patch.

[1] https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-02 19:43:40 Re: v12.0: ERROR: could not find pathkey item to sort
Previous Message Tomas Vondra 2019-11-02 18:15:34 Re: 64 bit transaction id