Re: index-only scans

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: index-only scans
Date: 2011-08-11 21:39:08
Message-ID: CAF6yO=2MseNGb-B29SZN4+wtuVh5VZLheqpj2HfLndODaB2tXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/8/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> Please find attached a patch implementing a basic version of
> index-only scans.  This patch is the work of my colleague Ibrar Ahmed
> and myself, and also incorporates some code from previous patches
> posted by Heikki Linnakanagas.

Great!.

>
> I'm able to demonstrate a significant performance benefit from this
> patch, even in a situation where it doesn't save any disk I/O, due to
> reduced thrashing of shared_buffers.  Configuration settings:
>
> max_connections = 100
> shared_buffers = 400MB
> maintenance_work_mem = 256MB
> synchronous_commit = off
> checkpoint_segments = 100
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.9
> checkpoint_warning = 90s
> seq_page_cost = 1.0
> random_page_cost = 1.0
> effective_cache_size = 3GB
>
> Test setup:
>
> pgbench -i -s 50
> create table sample_data as select (random()*5000000)::int as aid,
> repeat('a', 1000) as filler from generate_series(1,100000);
>
> Test queries:
>
> select sum(aid) from sample_data a1 where exists (select * from
> pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
> select sum(aid) from sample_data a1 where exists (select * from
> pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);
>
> On my laptop, the first query executes in about 555 ms, while the
> second one takes about 1125 ms.  Inspection via pg_buffercache reveals
> that the second one thrashes shared_buffers something fierce, while
> the first one does not.  You can actually get the time for the first
> query down to about 450 ms if you can persuade PostgreSQL to cache the
> entire sample_data table - which is difficult, due the
> BufferAccessStrategy stuff - and as soon as you run the second query,
> it blows the table out of cache, so practically speaking you're not
> going to get that faster time very often.  I expect that you could get
> an even larger benefit from this type of query if you could avoid
> actual disk I/O, rather than just buffer cache thrashing, but I
> haven't come up with a suitable test cases for that yet (volunteers?).
>
> There are several things about this patch that probably need further
> thought and work, or at least some discussion.
>
> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
> perhaps susceptible to improvement.  I thought about building a
> virtual tuple, but then what do I do with an OID column, if I have
> one?  Or maybe this should be done some other way altogether.

Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Munro 2011-08-11 22:45:59 Re: [HACKERS] Dropping extensions
Previous Message Tom Lane 2011-08-11 21:09:05 Re: error: could not find pg_class tuple for index 2662