Re: Seq scans roadmap

From: "CK Tan" <cktan(at)greenplum(dot)com>
To: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Simon Riggs" <simon(at)enterprisedb(dot)com>
Subject: Re: Seq scans roadmap
Date: 2007-05-10 18:27:00
Message-ID: 833BC7B3-048A-4CFC-89C5-119725FA4773@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, 16x8K page ring is too small indeed. The reason we selected 16
is because greenplum db runs on 32K page size, so we are indeed
reading 128K at a time. The #pages in the ring should be made
relative to the page size, so you achieve 128K per read.

Also agree that KillAndReadBuffer could be split into a
KillPinDontRead(), and ReadThesePinnedPages() functions. However, we
are thinking of AIO and would rather see a ReadNPagesAsync() function.

-cktan
Greenplum, Inc.

On May 10, 2007, at 3:14 AM, Zeugswetter Andreas ADI SD wrote:

>
>> In reference to the seq scans roadmap, I have just submitted
>> a patch that addresses some of the concerns.
>>
>> The patch does this:
>>
>> 1. for small relation (smaller than 60% of bufferpool), use
>> the current logic 2. for big relation:
>> - use a ring buffer in heap scan
>> - pin first 12 pages when scan starts
>> - on consumption of every 4-page, read and pin the next 4-page
>> - invalidate used pages of in the scan so they do not
>> force out other useful pages
>
> A few comments regarding the effects:
>
> I do not see how this speedup could be caused by readahead, so what
> are
> the effects ?
> (It should make no difference to do the CPU work for count(*)
> inbetween
> reading each block when the pages are not dirtied)
> Is the improvement solely reduced CPU because no search for a free
> buffer is needed and/or L2 cache locality ?
>
> What effect does the advance pinnig have, avoid vacuum ?
>
> A 16 x 8k page ring is too small to allow the needed IO blocksize of
> 256k.
> The readahead is done 4 x one page at a time (=32k).
> What is the reasoning behind 1/4 ring for readahead (why not 1/2), is
> 3/4 the trail for followers and bgwriter ?
>
> I think in anticipation of doing a single IO call for more that one
> page, the KillAndReadBuffer function should be split into two
> parts. One
> that does the killing
> for n pages, and one that does the reading for n pages.
> Killing n before reading n would also have the positive effect of
> grouping perhaps needed writes (not interleaving them with the reads).
>
> I think the 60% Nbuffers is a very good starting point. I would only
> introduce a GUC when we see evidence that it is needed (I agree with
> Simon's partitioning comments, but I'd still wait and see).
>
> Andreas
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lukas Kahwe Smith 2007-05-10 18:42:40 Re: Planning large IN lists
Previous Message Neil Conway 2007-05-10 18:20:26 Planning large IN lists