Re: Re: We have got a serious problem with pg_clog/WAL synchronization

From: Kenneth Marshall <ktm(at)it(dot)is(dot)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, xu(at)cs(dot)wisc(dot)edu
Subject: Re: Re: We have got a serious problem with pg_clog/WAL synchronization
Date: 2004-08-12 13:21:17
Message-ID: 20040812132117.GB16756@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> "Min Xu (Hsu)" <xu(at)cs(dot)wisc(dot)edu> writes:
> > It seems to me this is an interesting phenomena of interactions between
> > frequent events of transaction commits and infrequent events of system
> > checkpoints. A potential alternative solution to adding a new shared
> > lock to the frequent commit operation is to let the infrequent
> > checkpoint operation take more overhead. I suppose acquiring/releasing
> > an extra lock for each commit would incur extra performance overhead,
> > even when the lock is not contented. On the other hand, let the
> > checkpoing operation acquire some existing locks (exclusively) to
> > effectively disallowing committing transactions to interfere with the
> > checkpoint process might be a better solution since it incur higher
> > overhead only when necessary.
>
> Unfortunately, there isn't any pre-existing lock that will serve.
> A transaction that is between XLogInsert'ing its COMMIT record and
> updating the shared pg_clog data area does not hold any lock that
> could be used to prevent a checkpoint from starting. (Or it didn't
> until yesterday's patch, anyway.)
>
> I looked briefly at reorganizing the existing code so that we'd do the
> COMMIT XLogInsert while we're holding lock on the shared pg_clog data,
> which would solve the problem without adding any new lock acquisition.
> But this seemed extremely messy to do. Also it would be optimizing
> transaction commit at the cost of pessimizing other uses of pg_clog,
> which might have to wait longer to get at the shared data. Adding the
> new lock has the advantage that we can be sure it's not blocking
> anything we don't want it to block.
>
> Thanks for thinking about the problem though ...
>
> regards, tom lane
>

One problem with a high-traffic LWLock is that they require a write
to shared memory for both the shared lock and the exclusive lock. On
the increasingly prevalent SMP machines, this will cause the invalidation
of the cache-line containing the lock and the consequent reload and its
inherent delay. Would it be possible to use a latch + version number in
this case to minimize this problem by allowing all but the checkpoint to
perform a read-only action on the latch? This should eliminate the cache-line
shenanigans on SMP machines.

Ken Marshall

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paramveer.Singh 2004-08-12 13:28:47 pl pgsql grammer file contains error
Previous Message Raymond O'Donnell 2004-08-12 13:14:03 datpath error

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-08-12 13:42:05 Re: pg_restore (libpq? parser?) bug in 8
Previous Message ntufar 2004-08-12 12:26:57 Turkish downcasting in PL/pgSQL