Re: [PATCHES] Cleaning up unreferenced table files

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Cleaning up unreferenced table files
Date: 2005-05-08 16:35:22
Message-ID: 12069.1115570122@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> Consider the variant with extra marker files. In that case, backend B
> doesn't have to know about the .notcommitted status to flush the buffers.

[ shrug ] It's still broken, and the reason is that there's no
equivalent of fsync for directory operations. Consider

A creates 1234 and 1234.notcommitted

A commits

B performs a checkpoint

crash

all before A manages to delete 1234.notcommitted, or at least before
that deletion has made its way to disk. Upon restart, only WAL
events after the checkpoint will be replayed, so 1234.notcommitted
doesn't go away, and then you've got a problem.

To fix this there would need to be a way (1) for B to be aware of the
pending file deletion and (2) for B to delay committing the checkpoint
until the directory update is surely down on disk. Your proposal
doesn't provide for (1), and even if we fixed that, I know of no
portable kernel API for (2). fsync isn't applicable.

While your original patch is buggy, it's at least fixable and has
localized, limited impact. I don't think these schemes are safe
at all --- they put a great deal more weight on the semantics of
the filesystem than I care to do.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-05-08 16:36:55 Re: Will new release require an initdb?
Previous Message Jim C. Nasby 2005-05-08 16:35:15 Re: Views, views, views! (long)

Browse pgsql-patches by date

  From Date Subject
Next Message Dennis Bjorklund 2005-05-08 17:00:15 lastval()
Previous Message Tom Lane 2005-05-08 04:28:02 Re: Dealing with CLUSTER failures