Re: PATCH: index-only scans with partial indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: index-only scans with partial indexes
Date: 2015-09-05 17:17:41
Message-ID: 55EB23B5.8030805@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 09/05/2015 10:53 AM, Simon Riggs wrote:
> On 4 September 2015 at 22:03, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> A summary of 100 EXPLAIN timings looks like this:
>
>
> master A B C D E F
> -------------------------------------------------------------------------
> min 0.10 0.10 0.30 0.29 0.66 0.23
> max 1.07 1.00 2.13 1.98 4.52 1.59
> median 0.49 0.52 0.31 0.33 0.68 1.12
> average 0.43 0.35 0.62 0.49 1.01 0.89
>
>
> What are these? Times? in ms?

Yes, those are planning times in milliseconds. I've been thinking about
possible issues in the benchmark, and I ended up with two main suspects:

(a) environment - VM running on a laptop. thus quite noisy and
subject to various sources of overhead, power-management, etc.

(b) time measured using \timing in psql (by running EXPLAIN), so
probably influenced by formatting/transfer

So I reran the benchmark on a different machine (bare metal, pretty much
no noise in the results), and measured the planning time using EXPLAIN
ANALYZE (Planning Time). And I got this (milliseconds):

A B C D E F
-----------------------------------------------------------------
min 0.04 0.04 0.11 0.10 0.37 0.12
max 0.10 0.10 0.92 0.92 1.62 1.23
median 0.04 0.04 0.11 0.11 0.37 0.13
average 0.04 0.04 0.11 0.11 0.38 0.14

A B C D E F
-----------------------------------------------------------------
min 0.04 0.04 0.11 0.11 0.38 0.13
max 0.10 0.10 0.92 0.94 1.64 1.21
median 0.04 0.04 0.11 0.11 0.39 0.13
average 0.04 0.04 0.11 0.12 0.40 0.14

So much lower numbers (better CPU, no virtualization, etc.), but
otherwise exactly the same conclusion - no overhead compared to master.

I think of three ways how to make the checks more expensive:

(a) using more indexes

The current benchmark already uses 40 indexes (and I've tried
with 100), and we've seen no impact at all. Adding more indexes
will eventually show some overhead, but the number of indexes
will be very high - I doubt anyone has a table with hundreds of
partial indexes on a it.

(b) using more complex index predicates

I expect the predicate_implied_by() call to get more expensive
for more complex predicates. I however believe that's quite
uncommon case - vast majority of index predicates that I've seen
use just a single equality clause.

(c) using more complex queries (more WHERE conditions)

Having more complex WHERE clauses seems quite plausible, though,
so I've decided to try it. Instead of the simple query used
before:

select a from t where b >= 100 and b <= 200;

I've used a query with a bunch of other conditions:

select a from t where b >= 100 and b <= 200
and c >= 100 and c <= 200
and d >= 100 and d <= 200
and a >= 100 and a <= 100;

And indeed, this made a (tiny) difference - on the master, the
planning was 0.50 ms on average, while with the patch it was
0.55. But 0.05 ms is just barely above noise, even on this HW.

Of course, this only impacts the case with partial indexes, all
the other cases were exactly the same with and without patch.
>
> However that was the idea of the benchmark, and I got no difference.
>
>
> Please explain what this means and your conclusion, so its clear. That
> way we can either reject the patch or commit it. Thanks

That means I've been unable to measure any significant overhead of the
patch. There certainly are extreme cases where this patch might make the
planning noticeably slower, but I believe those are rather artificial,
and certainly wouldn't expect them in databases where a tiny increase of
planning time would be a problem.

This benchmark however only looked at the planning overhead, but we
should weight that with respect to possible gains. And IOS is a great
optimization - it's not uncommon to see 2-3x improvements on databases
that fit into RAM, and order of magnitude improvements on large
databases (thanks to eliminating the random I/O when accessing the heap).

So my opinion is that we should commit this patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2015-09-05 18:01:52 Re: Counting lines correctly in psql help displays
Previous Message Tom Lane 2015-09-05 16:55:44 Counting lines correctly in psql help displays