Re: [HACKERS] TODO item

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: t-ishii(at)sra(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] TODO item
Date: 2000-02-09 08:22:02
Message-ID: 20000209172202B.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> BTW, Hiroshi has noticed me an excellent point #3:
>
> >Session-1
> >begin;
> >update A ...;
> >
> >Session-2
> >begin;
> >select * fromB ..;
> > There's no PostgreSQL shared buffer available.
> > This backend has to force the flush of a free buffer
> > page. Unfortunately the page was dirtied by the
> > above operation of Session-1 and calls pg_fsync()
> > for the table A. However fsync() is postponed until
> > commit of this backend.
> >
> >Session-1
> >commit;
> > There's no dirty buffer page for the table A.
> > So pg_fsync() isn't called for the table A.
>
> Seems there's no easy solution for this. Maybe now is the time to give
> up my idea...

Thinking about a little bit more, I have come across yet another
possible solution. It is actually *very* simple. Details as follows.

In xact.c:RecordTransactionCommit() there are two FlushBufferPool
calls. One is for relation files and the other is for pg_log. I add
sync() right after these FlushBufferPool. It will force any pending
kernel buffers physically be written onto disk, thus should guarantee
the ACID of the transaction (see attached code fragment).

There are two things that we should worry about sync, however.

1. Does sync really wait for the completion of data be written on to
disk?

I looked into the man page of sync(2) on Linux 2.0.36:

According to the standard specification (e.g., SVID),
sync() schedules the writes, but may return before the
actual writing is done. However, since version 1.3.20
Linux does actually wait. (This still does not guarantee
data integrity: modern disks have large caches.)

It seems that sync(2) blocks until data is written. So it would be ok
at least with Linux. I'm not sure about other platforms, though.

2. Are we suffered any performance penalty from sync?

Since sync forces *all* dirty buffers on the system be written onto
disk, it might be slower than fsync. So I did some testings using
contrib/pgbench. Starting postmaster with -F on (and with sync
modification), I ran 32 concurrent clients with performing 10
transactions each. In total 320 transactions are performed. Each
transaction contains an UPDATE and a SELECT to a table that has 1000k
tuples and an INSERT to another small table. The result showed that -F
+ sync was actually faster than the default mode (no -F, no
modifications). The system is a Red Hat 5.2, with 128MB RAM.

-F + sync normal mode
--------------------------------------------------------
transactions/sec 3.46 2.93

Of course if there are disk activities other than PostgreSQL, sync
would be suffered by it. However, in most cases the system is
dedicated for only PostgreSQL, and I don't think this is a big problem
in the real world.

Note that for large COPY or INSERT was much faster than the normal
mode due to no per-page-fsync.

Thinking about all these, I would like to propose we add a new switch
to postgres to run with -F + sync.

------------------------------------------------------------------------
/*
* If no one shared buffer was changed by this transaction then
* we don't flush shared buffers and don't record commit status.
*/
if (SharedBufferChanged)
{
FlushBufferPool();
sync();
if (leak)
ResetBufferPool();

/*
* have the transaction access methods record the status
* of this transaction id in the pg_log relation.
*/
TransactionIdCommit(xid);

/*
* Now write the log info to the disk too.
*/
leak = BufferPoolCheckLeak();
FlushBufferPool();
sync();
}

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris 2000-02-09 09:12:56 Re: [HACKERS] backend startup
Previous Message Tom Lane 2000-02-09 07:51:38 Re: [HACKERS] backend startup