Re: Index Skip Scan

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: pg(at)bowt(dot)ie
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, bhushan(dot)uparkar(at)gmail(dot)com, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2018-09-10 21:47:06
Message-ID: CA+q6zcV-wgnSMu-YOEmyXVckncQyriiKtUZXm=Gx2s+FV=5MDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mon, 18 Jun 2018 at 17:26, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
>
> I took Thomas Munro's previous patch [2] on the subject, added a GUC, a
> test case, documentation hooks, minor code cleanups, and made the patch
> pass an --enable-cassert make check-world run. So, the overall design is
> the same.

I've looked through the patch more closely, and have a few questions:

* Is there any reason why only copy function for the IndexOnlyScan node
includes an implementation for distinctPrefix? Without read/out functionality
skip doesn't work for parallel scans, so it becomes like that:

=# SET force_parallel_mode TO ON;
=# EXPLAIN (ANALYZE, VERBOSE, BUFFERS ON) SELECT DISTINCT b FROM t1;
QUERY PLAN

-------------------------------------------------------------------------------
Gather (cost=1000.43..1001.60 rows=3 width=4)
(actual time=11.054..17672.010 rows=10000000 loops=1)
Output: b
Workers Planned: 1
Workers Launched: 1
Single Copy: true
Buffers: shared hit=91035 read=167369
-> Index Skip Scan using idx_t1_b on public.t1
(cost=0.43..1.30 rows=3 width=4)
(actual time=1.350..16065.165 rows=10000000 loops=1)
Output: b
Heap Fetches: 10000000
Buffers: shared hit=91035 read=167369
Worker 0: actual time=1.350..16065.165 rows=10000000 loops=1
Buffers: shared hit=91035 read=167369
Planning Time: 0.394 ms
Execution Time: 6037.800 ms

and with this functionality it gets better:

=# SET force_parallel_mode TO ON;
=# EXPLAIN (ANALYZE, VERBOSE, BUFFERS ON) SELECT DISTINCT b FROM t1;
QUERY PLAN

-------------------------------------------------------------------------------
Gather (cost=1000.43..1001.60 rows=3 width=4)
(actual time=3.564..4.427 rows=3 loops=1)
Output: b
Workers Planned: 1
Workers Launched: 1
Single Copy: true
Buffers: shared hit=4 read=10
-> Index Skip Scan using idx_t1_b on public.t1
(cost=0.43..1.30 rows=3 width=4)
(actual time=0.065..0.133 rows=3 loops=1)
Output: b
Heap Fetches: 3
Buffers: shared hit=4 read=10
Worker 0: actual time=0.065..0.133 rows=3 loops=1
Buffers: shared hit=4 read=10
Planning Time: 1.724 ms
Execution Time: 4.522 ms

* What is the purpose of HeapFetches? I don't see any usage of this variable
except assigning 0 to it once.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-09-10 22:12:36 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Alvaro Herrera 2018-09-10 18:37:39 Re: can commitfest application resend a mail?