Re: [HACKERS] TODO item

From: Alfred Perlstein <bright(at)wintelcom(dot)net>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] TODO item
Date: 2000-02-09 10:04:48
Message-ID: 20000209020448.P17536@fw.wintelcom.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> [000209 00:51] wrote:
> > BTW, Hiroshi has noticed me an excellent point #3:
> >
> > >Session-1
> > >begin;
> > >update A ...;
> > >
> > >Session-2
> > >begin;
> > >select * fromB ..;
> > > There's no PostgreSQL shared buffer available.
> > > This backend has to force the flush of a free buffer
> > > page. Unfortunately the page was dirtied by the
> > > above operation of Session-1 and calls pg_fsync()
> > > for the table A. However fsync() is postponed until
> > > commit of this backend.
> > >
> > >Session-1
> > >commit;
> > > There's no dirty buffer page for the table A.
> > > So pg_fsync() isn't called for the table A.
> >
> > Seems there's no easy solution for this. Maybe now is the time to give
> > up my idea...
>
> Thinking about a little bit more, I have come across yet another
> possible solution. It is actually *very* simple. Details as follows.
>
> In xact.c:RecordTransactionCommit() there are two FlushBufferPool
> calls. One is for relation files and the other is for pg_log. I add
> sync() right after these FlushBufferPool. It will force any pending
> kernel buffers physically be written onto disk, thus should guarantee
> the ACID of the transaction (see attached code fragment).
>
> There are two things that we should worry about sync, however.
>
> 1. Does sync really wait for the completion of data be written on to
> disk?
>
> I looked into the man page of sync(2) on Linux 2.0.36:
>
> According to the standard specification (e.g., SVID),
> sync() schedules the writes, but may return before the
> actual writing is done. However, since version 1.3.20
> Linux does actually wait. (This still does not guarantee
> data integrity: modern disks have large caches.)
>
> It seems that sync(2) blocks until data is written. So it would be ok
> at least with Linux. I'm not sure about other platforms, though.

It is incorrect to assume that sync() wait until all buffers are
flushed on any other platform than Linux, I didn't think
that Linux even did so but the kernel sources say yes.

Solaris doesn't do this and niether does FreeBSD/NetBSD.

I guess if you wanted to implement this for linux only then it would
work, you ought to then also warn people that a non-dedicated db server
could experiance different performance using this code.

-Alfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Lo 2000-02-09 10:28:08 Re: [HACKERS] WinNT compiling: ongoing References: <Pine.BSF.4.10.10002020958270.10395
Previous Message Kevin Lo 2000-02-09 10:01:55 Re: [HACKERS] WinNT compiling: ongoing