Quick Links

Re: Seq scans roadmap

From:	"CK Tan" <cktan(at)greenplum(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Simon Riggs" <simon(at)enterprisedb(dot)com>
Subject:	Re: Seq scans roadmap
Date:	2007-05-10 03:52:24
Message-ID:	30E8D12C-C5C1-48DA-BF06-08353C398C35@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

In reference to the seq scans roadmap, I have just submitted a patch
that addresses some of the concerns.

The patch does this:

1. for small relation (smaller than 60% of bufferpool), use the
current logic
2. for big relation:
- use a ring buffer in heap scan
- pin first 12 pages when scan starts
- on consumption of every 4-page, read and pin the next 4-page
- invalidate used pages of in the scan so they do not force out
other useful pages

4 files changed:
bufmgr.c, bufmgr.h, heapam.c, relscan.h

If there are interests, I can submit another scan patch that returns
N tuples at a time, instead of current one-at-a-time interface. This
improves code locality and further improve performance by another
10-20%.

For TPCH 1G tables, we are seeing more than 20% improvement in scans
on the same hardware.

------------------------------------------------------------------------
-
----- PATCHED VERSION
------------------------------------------------------------------------
-
gptest=# select count(*) from lineitem;
count
---------
6001215
(1 row)

Time: 2117.025 ms

------------------------------------------------------------------------
-
----- ORIGINAL CVS HEAD VERSION
------------------------------------------------------------------------
-
gptest=# select count(*) from lineitem;
count
---------
6001215
(1 row)

Time: 2722.441 ms

Suggestions for improvement are welcome.

Regards,
-cktan
Greenplum, Inc.

On May 8, 2007, at 5:57 AM, Heikki Linnakangas wrote:

> Luke Lonergan wrote:
>>> What do you mean with using readahead inside the heapscan?
>>> Starting an async read request?
>> Nope - just reading N buffers ahead for seqscans. Subsequent
>> calls use
>> previously read pages. The objective is to issue contiguous reads to
>> the OS in sizes greater than the PG page size (which is much smaller
>> than what is needed for fast sequential I/O).
>
> Are you filling multiple buffers in the buffer cache with a single
> read-call? The OS should be doing readahead for us anyway, so I
> don't see how just issuing multiple ReadBuffers one after each
> other helps.
>
>> Yes, I think the ring buffer strategy should be used when the
>> table size
>> is > 1 x bufcache and the ring buffer should be of a fixed size
>> smaller
>> than L2 cache (32KB - 128KB seems to work well).
>
> I think we want to let the ring grow larger than that for updating
> transactions and vacuums, though, to avoid the WAL flush problem.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

In response to

Re: Seq scans roadmap at 2007-05-08 12:57:27 from Heikki Linnakangas

Responses

Re: Seq scans roadmap at 2007-05-10 10:14:11 from Zeugswetter Andreas ADI SD

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-10 04:20:46	Re: Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Previous Message	Alvaro Herrera	2007-05-10 02:09:53	Re: Implemented current_query