Re: Seq scans roadmap

From: "CK Tan" <cktan(at)greenplum(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Simon Riggs" <simon(at)enterprisedb(dot)com>
Subject: Re: Seq scans roadmap
Date: 2007-05-10 03:52:24
Message-ID: 30E8D12C-C5C1-48DA-BF06-08353C398C35@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In reference to the seq scans roadmap, I have just submitted a patch
that addresses some of the concerns.

The patch does this:

1. for small relation (smaller than 60% of bufferpool), use the
current logic
2. for big relation:
- use a ring buffer in heap scan
- pin first 12 pages when scan starts
- on consumption of every 4-page, read and pin the next 4-page
- invalidate used pages of in the scan so they do not force out
other useful pages

4 files changed:
bufmgr.c, bufmgr.h, heapam.c, relscan.h

If there are interests, I can submit another scan patch that returns
N tuples at a time, instead of current one-at-a-time interface. This
improves code locality and further improve performance by another
10-20%.

For TPCH 1G tables, we are seeing more than 20% improvement in scans
on the same hardware.

------------------------------------------------------------------------
-
----- PATCHED VERSION
------------------------------------------------------------------------
-
gptest=# select count(*) from lineitem;
count
---------
6001215
(1 row)

Time: 2117.025 ms

------------------------------------------------------------------------
-
----- ORIGINAL CVS HEAD VERSION
------------------------------------------------------------------------
-
gptest=# select count(*) from lineitem;
count
---------
6001215
(1 row)

Time: 2722.441 ms

Suggestions for improvement are welcome.

Regards,
-cktan
Greenplum, Inc.

On May 8, 2007, at 5:57 AM, Heikki Linnakangas wrote:

> Luke Lonergan wrote:
>>> What do you mean with using readahead inside the heapscan?
>>> Starting an async read request?
>> Nope - just reading N buffers ahead for seqscans. Subsequent
>> calls use
>> previously read pages. The objective is to issue contiguous reads to
>> the OS in sizes greater than the PG page size (which is much smaller
>> than what is needed for fast sequential I/O).
>
> Are you filling multiple buffers in the buffer cache with a single
> read-call? The OS should be doing readahead for us anyway, so I
> don't see how just issuing multiple ReadBuffers one after each
> other helps.
>
>> Yes, I think the ring buffer strategy should be used when the
>> table size
>> is > 1 x bufcache and the ring buffer should be of a fixed size
>> smaller
>> than L2 cache (32KB - 128KB seems to work well).
>
> I think we want to let the ring grow larger than that for updating
> transactions and vacuums, though, to avoid the WAL flush problem.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-05-10 04:20:46 Re: Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Previous Message Alvaro Herrera 2007-05-10 02:09:53 Re: Implemented current_query