Re: Sorting writes during checkpoint

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Sorting writes during checkpoint
Date: 2008-04-16 04:22:13
Message-ID: 20080416125802.78C9.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> On Tue, 15 Apr 2008, ITAGAKI Takahiro wrote:
>
> > 2x Quad core Xeon, 16GB RAM, 4x HDD (RAID-0)
>
> What is the disk controller in this system? I'm specifically curious
> about what write cache was involved, so I can get a better feel for the
> hardware your results came from.

I used HP ProLiant DL380 G5 with Smart Array P400 with 256MB cache
(http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/15351-15351-3328412-241644-241475-1121516.html)
and ext3fs on LVM of CentOS 5.1 (Linux version 2.6.18-53.el5).
Dirty region of database was probably larger than disk controller's cache.

> buf_to_write = (BufAndTag *) palloc(NBuffers * sizeof(BufAndTag));
>
> If shared_buffers(=NBuffers) is set to something big, this could give some
> memory churn. And I think it's a bad idea to allocate something this
> large at checkpoint time, because what happens if that fails? Really not
> the time you want to discover there's no RAM left.

Hmm, but I think we need to copy buffer tags into bgwriter's local memory
in order to avoid locking taga many times in the sorting. Is it better to
allocate sorting buffers at the first time and keep and reuse it from then on?

> BufAndTag is a relatively small structure (5 ints). Let's call it 40
> bytes; even that's only a 0.5% overhead relative to the shared buffer
> allocation. If we can speed checkpoints significantly with that much
> overhead it sounds like a good tradeoff to me.

I thinks sizeof(BufAndTag) is 20 bytes because sizeof(int) is 4 on typical
platforms (and if not, I should rewrite the patch to be always so).
It is 0.25% of shared buffers; when shared_buffers is set to 10GB,
it takes 25MB of process local memory. If we want to consume less memory
for it, RelFileNode in BufferTag could be hashed and packed into an integer;
The blockNum order is important for this purpose, but RelFileNode is not.
It makes the overhead to 12 bytes per page (0.15%). Is it worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-16 04:54:31 Re: pg_terminate_backend() issues
Previous Message Bruce Momjian 2008-04-16 04:09:43 Re: pg_terminate_backend() issues

Browse pgsql-patches by date

  From Date Subject
Next Message Brendan Jurd 2008-04-16 06:52:22 Re: [HACKERS] Text <-> C string
Previous Message Andrew Chernow 2008-04-16 01:42:36 Re: libpq object hooks patch