Re: Index-only scan performance regression

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only scan performance regression
Date: 2012-01-31 17:35:58
Message-ID: CA+TgmoZYcFO0zyKHKco-MNDaALA=zmBptpxY1HRVZjU0c8ojCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 29, 2012 at 2:47 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> Given a freshly created table (not vacuumed), and a query that uses an
> index-only scan, for example:
>
> CREATE TABLE foo(a int PRIMARY KEY);
> INSERT INTO foo SELECT * FROM generate_series(1,1000000);
> ANALYSE foo;
>
> EXPLAIN ANALYSE SELECT count(*) FROM foo WHERE a <= 10000;
>
>                                                            QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=322.86..322.87 rows=1 width=0) (actual
> time=23.646..23.646 rows=1 loops=1)
>   ->  Index Only Scan using foo_pkey on foo  (cost=0.00..300.42
> rows=8975 width=0) (actual time=0.027..22.291 rows=10000 loops=1)
>         Index Cond: (a <= 10000)
>         Heap Fetches: 10000
>  Total runtime: 23.673 ms
> (5 rows)
>
>
> the actual performance is much worse than the equivalent index scan,
> as used in 9.1 and earlier:
>
> SET enable_indexonlyscan = off;
> EXPLAIN ANALYSE SELECT count(*) FROM foo WHERE a <= 10000;
>
>                                                         QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=322.86..322.87 rows=1 width=0) (actual
> time=3.219..3.219 rows=1 loops=1)
>   ->  Index Scan using foo_pkey on foo  (cost=0.00..300.42 rows=8975
> width=0) (actual time=0.014..2.302 rows=10000 loops=1)
>         Index Cond: (a <= 10000)
>  Total runtime: 3.240 ms
> (4 rows)
>
>
> Obviously this is the worst-case for an index-only scan, since there
> is no visibility map, and it has to fetch each tuple from the heap,
> but ideally this should perform around the same as an ordinary index
> scan, since it's doing pretty much the same work.

Yeah.

> Digging around, it looks like the additional cost is coming from
> visibilitymap_test(), which is calling smgrexists() for each tuple, to
> see if the visibility map file has been created. So it's doing a file
> access check for each row, while the visibility map doesn't exist.
>
> I'm not really familiar with this code, but a possible fix seems to be
> to send an invalidation message in vm_extend() when it creates or
> extends the visibility map, so that vm_readbuf() only has to re-check
> the visibility map file if smgr_vm_nblocks has been invalidated. With
> this change (attached) the worst-case index-only scan time becomes
> around the same as the index scan time.

I think that the additional sinval message might not do much good,
because it won't necessarily be seen until the next time any given
other backend does AcceptInvalidationMessages(), which might be a
while. That's not an issue for truncation because it requires
AccessExclusiveLock, so we know that any subsequent access to the
table will involve taking a heavyweight lock, which will
AcceptInvalidationMessages(). Making vismap extension require
AccessExclusiveLock would be a cure worse than the disease.

In the case of an index-only scan, it wouldn't be the end of the world
if we failed to notice that the relation had been extended. The worst
case would be that we wouldn't read the visibility map buffer and, as
you say, that shouldn't be any worse than a normal index scan. The
only time we had better not fail to notice that the visibility map has
been extended is when we're asked to clear a bit, because failure to
notice that the new page is there - with a set bit that must now be
cleared - could result in queries returning wrong answers. So one
possibility would be to tweak the relcache stuff to have a flag
indicating whether the value is cached, and make that separate from
the cached value. Then, if the value is cached, and we're doing
anything other than clearing a bit, we just say, eh, sorry, page isn't
there. If we're clearing a bit then we go to the trouble of
revalidating.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-01-31 17:58:08 Re: JSON for PG 9.2
Previous Message Robert Haas 2012-01-31 17:12:10 Re: foreign key locks, 2nd attempt