Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, Larry Rosenman <ler(at)lerctr(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)
Date: 2000-11-16 22:05:07
Message-ID: 27875.974412307@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM> writes:
> A long ago you, Bruce, made me gift - book about transaction processing
> (thanks again -:)). This sleeping before fsync in commit is described
> there as standard technique. And the reason is cleanest.
> Men, cost of fsync is very high! { write (64 bytes) + fsync() }
> takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse
> performance when there is only one backend running but greatly
> increase overall performance for 100 simultaneous backends. Ie this
> delay is trade off to gain better scalability.

> I agreed that it must be configurable, smaller or probably 0 by
> default, use approximate # of simultaneously running backends for
> guessing (postmaster could maintain this number in shmem and
> backends could just read it without any locking - exact number is
> not required), good described as tuning patameter in documentation.
> Anyway I object sleep(0).

Good points. Another idea that Bruce and I kicked around on the phone
was to make the pre-fsync delay be self-adjusting; that is, it'd
automatically move up and down based on system load. For example,
you could keep track of the time since the last xact commit, and guess
that the time to the next one will be similar. If that's greater than
your intended sleep delay, forget the sleep and just fsync. But the
shorter the time since the last commit, the longer you should be willing
to delay. This'd need some experimentation to get right, but it seems a
lot better than asking the dbadmin to pick a value.

Another thing that should happen is that once someone fsyncs, all the
other backends waiting should be awoken immediately, instead of waiting
for their delays to time out. Not sure how doable this is --- there's
no wait-for-semaphore-with-timeout in SysV IPC, is there? Perhaps we
can distinguish the first waiter (the guy who will ultimately do the
fsync, he's just hoping for some passengers) from the rest, who see
that someone's already waiting for fsync and just wait for him to do it.
Those other guys don't do a time wait, they sleep on a semaphore that
the first waiter will release once he's done the fsync.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-11-16 22:31:12 SearchSysCache changes committed
Previous Message Carlos Jacobs 2000-11-16 21:56:08 Import text field