Skip site navigation (1) Skip section navigation (2)

Re: Sorting writes during checkpoint

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Sorting writes during checkpoint
Date: 2008-04-16 04:22:13
Message-ID: 20080416125802.78C9.52131E4D@oss.ntt.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> On Tue, 15 Apr 2008, ITAGAKI Takahiro wrote:
> 
> > 2x Quad core Xeon, 16GB RAM, 4x HDD (RAID-0)
> 
> What is the disk controller in this system?  I'm specifically curious 
> about what write cache was involved, so I can get a better feel for the 
> hardware your results came from.

I used HP ProLiant DL380 G5 with Smart Array P400 with 256MB cache
(http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/15351-15351-3328412-241644-241475-1121516.html)
and ext3fs on LVM of CentOS 5.1 (Linux version 2.6.18-53.el5).
Dirty region of database was probably larger than disk controller's cache.


> buf_to_write = (BufAndTag *) palloc(NBuffers * sizeof(BufAndTag));
> 
> If shared_buffers(=NBuffers) is set to something big, this could give some 
> memory churn.  And I think it's a bad idea to allocate something this 
> large at checkpoint time, because what happens if that fails?  Really not 
> the time you want to discover there's no RAM left.

Hmm, but I think we need to copy buffer tags into bgwriter's local memory
in order to avoid locking taga many times in the sorting. Is it better to
allocate sorting buffers at the first time and keep and reuse it from then on?


> BufAndTag is a relatively small structure (5 ints).  Let's call it 40 
> bytes; even that's only a 0.5% overhead relative to the shared buffer 
> allocation.  If we can speed checkpoints significantly with that much 
> overhead it sounds like a good tradeoff to me.

I thinks sizeof(BufAndTag) is 20 bytes because sizeof(int) is 4 on typical
platforms (and if not, I should rewrite the patch to be always so).
It is 0.25% of shared buffers; when shared_buffers is set to 10GB,
it takes 25MB of process local memory. If we want to consume less memory
for it, RelFileNode in BufferTag could be hashed and packed into an integer;
The blockNum order is important for this purpose, but RelFileNode is not.
It makes the overhead to 12 bytes per page (0.15%). Is it worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2008-04-16 04:54:31
Subject: Re: pg_terminate_backend() issues
Previous:From: Bruce MomjianDate: 2008-04-16 04:09:43
Subject: Re: pg_terminate_backend() issues

pgsql-patches by date

Next:From: Brendan JurdDate: 2008-04-16 06:52:22
Subject: Re: [HACKERS] Text <-> C string
Previous:From: Andrew ChernowDate: 2008-04-16 01:42:36
Subject: Re: libpq object hooks patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group