Re: PATCH: index-only scans with partial indexes

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: index-only scans with partial indexes
Date: 2015-09-13 18:03:06
Message-ID: 1147640721.1321636.1442167386054.JavaMail.yahoo@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

> That means I've been unable to measure any significant overhead
> of the patch.

I've run a lot of benchmarks, and with anything resembling a common
query the differences in planning time are lost in the noise. (I
didn't create a better example than Tomas of where a lot of indexes
cause a minimal increase in planning time.)
The test environment is a "bare iron" machine with:

1 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (4 cores; 8 HW threads)
16GB DDR3 RAM
2 1TB drives in RAID 1
ubuntu 14.04 LTS 64-bit
various checkouts from master, most recently a7212a99
no cassert, default optimizations

As one example, to get a heap bigger than RAM I set up like this:

drop table if exists t;
create table t (a int not null, b int not null, x text not null);
insert into t
select i, i, repeat(md5(i::text), 50)
from generate_series(1,10000000) s(i);
vacuum freeze t;
checkpoint;

I ran one-index tests like this:

create index t_b_partial on t(b) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose)
select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(*) from t where a > 1000 and a < 2000;

... then two-index tests like this:

create index t_b_a_partial on t(b, a) where a > 1000 and a < 2000;
vacuum analyze t;
checkpoint;
explain (analyze, buffers, verbose)
select count(b) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(a) from t where a > 1000 and a < 2000;
explain (analyze, buffers, verbose)
select count(*) from t where a > 1000 and a < 2000;

All queries were run 5 times and (to minimize stray slowdowns from
other sources on this desktop machine) I took the minimum plan time
and minimum execution time. (My browser and other optional
processes were stopped to also minimize noise, but the results
still had more noise than I would prefer.)

master - single index
---------------------
Planning time: 0.078 ms
Execution time: 0.544 ms

Planning time: 0.079 ms
Execution time: 0.533 ms

Planning time: 0.066 ms
Execution time: 0.491 ms

master - both indexes
---------------------
Planning time: 0.080 ms
Execution time: 0.396 ms

Planning time: 0.076 ms
Execution time: 0.373 ms

Planning time: 0.056 ms
Execution time: 0.275 ms

patched - single index
----------------------
Planning time: 0.032 ms
Execution time: 0.136 ms

Planning time: 0.079 ms
Execution time: 0.537 ms

Planning time: 0.050 ms
Execution time: 0.213 ms

patched - both indexes
----------------------
Planning time: 0.100 ms
Execution time: 0.373 ms

Planning time: 0.067 ms
Execution time: 0.251 ms

Planning time: 0.065 ms
Execution time: 0.240 ms

In my view, the most disappointing thing about the patch is that
when both indexes are present, it doesn't use the narrower one. If
*only* the narrower index is present, it runs the index-only scan
using that index for count(b) and count(*), which is faster. Can
we wrangle this patch into making a better choice among available
index-only scans?

It also seems disappointing that we don't recognize that
count(columnname) could be treated as a synonym for count(*) if
columnname is NOT NULL, but that doesn't seem like material for
this patch.

Benchmarking took so much time I did not get to a close review of
the code changes. :-( Based on just the test results, it appears
to me that the patch as it stands would be a net win for the vast
majority of workloads where it would make any noticeable difference,
and I didn't manage to create any big downside. I would like to
see whether we can't improve the choice of partial index when there
are multiple possibilities -- it seems quite surprising to see
identical estimates for indexes of different column counts and key
widths, and to see the wider index chosen when the narrow one is
clearly (and predictably) faster.

I am changing this to Waiting on Author.

I will be on vacation without Internet access for the next 15 days,
so hopefully someone else can have a look when a new version is
posted. If it's still open I'll have a look when I get back.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2015-09-13 18:32:30 Re: 9.3.9 and pg_multixact corruption
Previous Message Ildus Kurbangaliev 2015-09-13 16:43:01 Re: [PATCH] Refactoring of LWLock tranches