Re: Load distributed checkpoint

From: "Takayuki Tsunakawa" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Load distributed checkpoint
Date: 2006-12-21 08:14:13
Message-ID: 01eb01c724d7$ffd47230$19527c0a@OPERAO
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

From: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
> You were running the test on the very memory-depend machine.
>> shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
> Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
> data and 2GB of memory. Storage is always the main part of
performace here,
> even not in checkpoints.

Yes, I used half the size of RAM as the shared buffers, which is
reasonable. And I cached all the data. The effect of fsync() is a
heavier offence, isn't it? System administrators would say "I have
enough memory. The data hasn't exhausted the DB cache yet. But the
users complain to me about the response. Why? What should I do?
What? Checkpoint?? Why doesn't PostgreSQL take care of frontend
users?"
BTW, is DBT-2 an OLTP benchmark which randomly access some parts of
data, or a batch application which accesses all data? I'm not
familiar with it. I know that IPA opens it to the public.

> If you use Linux, it has very unpleased behavior in fsync(); It
locks all
> metadata of the file being fsync-ed. We have to wait for the
completion of
> fsync when we do read(), write(), and even lseek().
> Almost of your data is in the accounts table and it was stored in a
single
> file. All of transactions must wait for fsync to the single largest
file,
> so you saw the bottleneck was in the fsync.

Oh, really, what an evil fsync is! Yes, I sometimes saw a backend
waiting for lseek() to complete when it committed. But why does the
backend which is syncing WAL/pg_control have to wait for syncing the
data file? They are, not to mention, different files, and WAL and
data files are stored on separate disks.

>> [Conclusion]
>> I believe that the problem cannot be solved in a real sense by
>> avoiding fsync/fdatasync().
>
> I think so, too. However, I assume we can resolve a part of the
> checkpoint spikes with smoothing of write() alone.

First, what's the goal (if possible numerically? Have you explained
to community members why the patch would help many people? At least,
I haven't heard that fsync() can be seriously bad and we would close
our eyes to what fsync() does.
By the way, what good results did you get with DBT-2? If you don't
mind, can you show us?

> BTW, can we use the same way to fsync? We call fsync()s to all
modified
> files without rest in mdsync(), but it's not difficult at all to
insert
> sleeps between fsync()s. Do you think it helps us? One of issues is
that
> we have to sleep in file unit, which is maybe rough granularity.

No, it definitely won't help us. There is no reason why it will help.
It might help in some limited environments, though, how can we
characterize such environments? Can we say "our approach helps our
environments, but it won't help you. The kernel VM settings may help
you. Good luck!"?
We have to consider seriously. I think it's time to face the problem
and we should follow the approaches of experts like Jim Gray and DBMS
vendors, unless we have a new clever idea like them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2006-12-21 08:21:08 Re: New version of money type
Previous Message Stephen Frost 2006-12-21 07:37:04 Re: ERROR: tuple concurrently updated

Browse pgsql-patches by date

  From Date Subject
Next Message ITAGAKI Takahiro 2006-12-21 09:46:36 Re: Load distributed checkpoint
Previous Message ITAGAKI Takahiro 2006-12-21 02:55:42 Re: Load distributed checkpoint