Quick Links

Re: [PATCH] Lazy xid assingment V2

From:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, August Zajonc <augustz(at)augustz(dot)com>, Postgresql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCH] Lazy xid assingment V2
Date:	2007-09-01 19:52:55
Message-ID:	46D9C317.70807@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> I had an idea this morning that might be useful: back off the strength
>> of what we try to guarantee. Specifically, does it matter if we leak a
>> file on crash, as long as it isn't occupying a lot of disk space?
>> (I suppose if you had enough crashes to accumulate many thousands of
>> leaked files, the directory entries would start to be a performance drag,
>> but if your DB crashes that much you have other problems.) This leads
>> to the idea that we don't really need to protect the open(O_CREAT) per
>> se. Rather, we can emit a WAL entry *after* successful creation of a
>> file, while it's still empty. This eliminates all the issues about
>> logging an action that might fail. The WAL entry would need to include
>> the relfilenode and the creating XID. Crash recovery would track these
>> until it saw the commit or abort or prepare record for the XID, and if
>> it didn't find any, would remove the file.
>
> That idea, like all other approaches based on tracking WAL records, fail
> if there's a checkpoint after the WAL record (and that's quite likely to
> happen if the file is large). WAL replay wouldn't see the file creation
> WAL entry, and wouldn't know to track the xid. We'd need a way to carry
> the information over checkpoints.

Yes, checkpoints would need to include a list of created-but-yet-uncommitted
files. I think the hardest part is figuring out a way to get that information
to the backend doing the checkpoint - my idea was to track them in shared
memory, but that would impose a hard limit on the number of concurrent
file creations. Not nice :-(

But wait... I just had an idea.
We already got such a central list of created-but-uncommited
files - pg_class itself. There is a small window between file creation
and inserting the name into pg_class - but as Tom says, if we leak it then,
it won't use up much space anyway.

So maybe we should just scan pg_class on VACUUM, and obtain a list of files
that are referenced only from DEAD tuples. Those files we can than safely
delete, no?

If we *do* want a strict no-leakage guarantee, than we'd have to update pg_class
before creating the file, and flush the WAL. If we take Alvaro's idea of storing
temporary relations in a seperate directory, we could skip the flush for those,
because we can just clean out that directory after recovery. Having to flush
the WAL when creating non-temporary relations doesn't sound too bad - those
operations won't occur very often, I'd say.

greetings, Florian Pflug

In response to

Re: [PATCH] Lazy xid assingment V2 at 2007-09-01 18:12:30 from Heikki Linnakangas

Responses

Re: [PATCH] Lazy xid assingment V2 at 2007-09-01 21:13:08 from Florian G. Pflug

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John DeSoi	2007-09-01 19:56:29	Re: Per-function search_path => per-function GUC settings
Previous Message	Josh Tolley	2007-09-01 19:17:11	Re: Per-function search_path => per-function GUC settings