Re: Seq scans roadmap

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc: CK Tan <cktan(at)greenplum(dot)com>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)enterprisedb(dot)com>
Subject: Re: Seq scans roadmap
Date: 2007-05-10 13:33:05
Message-ID: 46431F11.9020303@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zeugswetter Andreas ADI SD wrote:
>> Also, that patch doesn't address the VACUUM issue at all. And
>> using a small fixed size ring with scans that do updates can
>> be devastating. I'm experimenting with different ring sizes
>> for COPY at the moment. Too small ring leads to a lot of WAL
>> flushes, it's basically the same problem we have with VACUUM
>> in CVS HEAD.
>
> My first take on that would be to simply abandon any dirty (and actually
> also any still pinned) buffer from the ring and replace the ring slot
> with a buffer from the freelist.
> If the freelist is empty and LSN allows writing the buffer, write it
> (and maybe try to group these).
> If the LSN does not allow the write, replace the slot with a buffer from
> LRU.

That would effectively disable the ring for COPY and the 2nd phase of
VACUUM.

One problem with looking at the LSN is that you need the content lock to
read it, and I wouldn't want to add any new locking. It could be done
inside FlushBuffer when we hold the lock anyway, but I'm afraid the
changes would be pretty invasive.

I'm struggling to get a grip of what the optimal ring size is under
various circumstances. Some thoughts I have this far:
- a small ring gives better L2 cache behavior
- for read-only queries, and for queries that just hint bits, 1 buffer
is enough
- small ring with query that writes WAL (COPY, mass updates, 2nd phase
of VACUUM) leads to a lot of WAL flushes, which can become bottleneck.

But all these assumptions need to be validated. I'm setting up tests
with different ring sizes and queries to get a clear picture of this:
- VACUUM on a clean table
- VACUUM on a table with 1 dead tuple per page
- read-only scan, large table
- read-only scan, table fits in OS cache
- COPY

In addition, I'm going to run VACUUM in a DBT-2 test to see the affect
on other queries running concurrently.

I think a ring that grows when WAL flushes occur covers all the use
cases reasonably well, but I need to do the testing...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2007-05-10 13:46:44 Re: Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Previous Message Alvaro Herrera 2007-05-10 12:46:14 Re: Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory