Re: DROP DATABASE vs patch to not remove files right away

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: DROP DATABASE vs patch to not remove files right away
Date: 2008-04-16 08:02:10
Message-ID: 4805B282.3090203@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> ISTM that we must fix the bgwriter so that ForgetDatabaseFsyncRequests
> causes PendingUnlinkEntrys for the doomed DB to be thrown away too.
> This should prevent the unlink-live-data scenario, I think.
> Even then, concurrent deletion attempts are probably possible (since
> ForgetDatabaseFsyncRequests is asynchronous) and rmtree() is being far
> too fragile about dealing with them. I think that it should be coded
> to ignore ENOENT the same as the bgwriter does, and that it should press
> on and keep trying to delete things even if it gets a failure.

Yep. I can write a patch for that, unless you're onto it already?

However, this makes me reconsider Florian's suggestion to just make
relfilenode larger and avoid reusing them altogether. It would simplify
the code quite a bit, and make it more robust. That is good because even
if we fix these problems per your suggestion, I'm left wondering if
we've missed some even weirder corner-cases.

Florian suggested a scheme where the xid and epoch is embedded in the
filename, but that's unnecessarily complex. We could just make
relfilenode a 64-bit integer. 2^64 should be enough for everyone.

You listed these problems with Florian's suggestion back then:

> 1. Zero chance of ever backpatching. (I know I said I wasn't excited
> about that, but it's still a strike against a proposed fix.)

Still true. We would need to do what you suggested for 8.3, but
simplifying the code would be good thing in the long run.

> 2. Adds new fields to RelFileNode, which will be a major code change,
> and possibly a noticeable performance hit (bigger hashtable keys).

We talked about this wrt. map forks, and concluded that it's not an
issue. If we add the map forks as well, BufferTag struct would grow from
16 bytes to 24 bytes. It's worth doing some more micro-benchmarking
with that, but it's probably acceptable. Or we could allocate a few bits
of the 64-bit relfilenode field in RelFileNode to indicate the map fork.

> 3. Adds new columns to pg_class, which is a real PITA ...

We would only have to change relfilenode from oid to int64.

> 4. Breaks oid2name and all similar code that knows about relfilenode.

True, but they're not hard to fix.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2008-04-16 08:40:53 Re: Problem with site doc search
Previous Message Cédric Villemain 2008-04-16 07:47:08 Re: Problem with site doc search