Re: Seq scans roadmap

From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Simon Riggs" <simon(at)enterprisedb(dot)com>, "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>, "CK(dot)Tan" <cktan(at)greenplum(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>
Subject: Re: Seq scans roadmap
Date: 2007-05-15 09:39:17
Message-ID: C3E62232E3BCF24CBA20D72BFDCB6BF804066A4A@MI8NYCMAIL08.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki,

32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
effect.

How about using 256/blocksize?

- Luke

> -----Original Message-----
> From: Heikki Linnakangas [mailto:hlinnaka(at)gmail(dot)com] On
> Behalf Of Heikki Linnakangas
> Sent: Tuesday, May 15, 2007 2:32 AM
> To: PostgreSQL-development
> Cc: Simon Riggs; Zeugswetter Andreas ADI SD; CK.Tan; Luke
> Lonergan; Jeff Davis
> Subject: Re: [HACKERS] Seq scans roadmap
>
> Just to keep you guys informed, I've been busy testing and
> pondering over different buffer ring strategies for vacuum,
> seqscans and copy.
> Here's what I'm going to do:
>
> Use a fixed size ring. Fixed as in doesn't change after the
> ring is initialized, however different kinds of scans use
> differently sized rings.
>
> I said earlier that it'd be invasive change to see if a
> buffer needs a WAL flush and choose another victim if that's
> the case. I looked at it again and found a pretty clean way
> of doing that, so I took that approach for seq scans.
>
> 1. For VACUUM, use a ring of 32 buffers. 32 buffers is small
> enough to give the L2 cache benefits and keep cache pollution
> low, but at the same time it's large enough that it keeps the
> need to WAL flush reasonable
> (1/32 of what we do now).
>
> 2. For sequential scans, also use a ring of 32 buffers, but
> whenever a buffer in the ring would need a WAL flush to
> recycle, we throw it out of the buffer ring instead. On
> read-only scans (and scans that only update hint bit) this
> gives the L2 cache benefits and doesn't pollute the buffer
> cache. On bulk updates, it's effectively the current
> behavior. On scans that do some updates, it's something in
> between. In all cases it should be no worse than what we have
> now. 32 buffers should be large enough to leave a "cache
> trail" for Jeff's synchronized scans to work.
>
> 3. For COPY that doesn't write WAL, use the same strategy as
> for sequential scans. This keeps the cache pollution low and
> gives the L2 cache benefits.
>
> 4. For COPY that writes WAL, use a large ring of 2048-4096
> buffers. We want to use a ring that can accommodate 1 WAL
> segment worth of data, to avoid having to do any extra WAL
> flushes, and the WAL segment size is
> 2048 pages in the default configuration.
>
> Some alternatives I considered but rejected:
>
> * Instead of throwing away dirtied buffers in seq scans,
> accumulate them in another fixed sized list. When the list
> gets full, do a WAL flush and put them to the shared freelist
> or a backend-private freelist. That would eliminate the cache
> pollution of bulk DELETEs and bulk UPDATEs, and it could be
> used for vacuum as well. I think this would be the optimal
> algorithm but I don't feel like inventing something that
> complicated at this stage anymore. Maybe for 8.4.
>
> * Using a different sized ring for 1st and 2nd vacuum phase.
> Decided that it's not worth the trouble, the above is already
> an order of magnitude better than the current behavior.
>
>
> I'm going to rerun the performance tests I ran earlier with
> new patch, tidy it up a bit, and submit it in the next few
> days. This turned out to be even more laborious patch to
> review than I thought. While the patch is short and in the
> end turned out to be very close to Simon's original patch,
> there's many different usage scenarios that need to be
> catered for and tested.
>
> I still need to check the interaction with Jeff's patch. This
> is close enough to Simon's original patch that I believe the
> results of the tests Jeff ran earlier are still valid.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2007-05-15 09:42:28 Re: Seq scans roadmap
Previous Message Heikki Linnakangas 2007-05-15 09:32:20 Re: Seq scans roadmap