| From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> | 
|---|---|
| To: | "Florian G(dot) Pflug" <fgp(at)phlo(dot)org> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jacky Leng <lengjianquan(at)163(dot)com> | 
| Subject: | Re: Why copy_relation_data only use wal when WALarchiving is enabled | 
| Date: | 2007-10-17 14:07:46 | 
| Message-ID: | 47161732.7090601@enterprisedb.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches | 
Florian G. Pflug wrote:
> Heikki Linnakangas wrote:
>> I wrote:
>>> Unfortunately I don't see any easy way to fix it. One approach would be
>>> to avoid reusing the relfilenodes until next checkpoint, but I don't see
>>> any nice place to keep track of OIDs that have been dropped since last
>>> checkpoint.
>>
>> Ok, here's one idea:
>>
>> Instead of deleting the file immediately on commit of DROP TABLE, the
>> file is truncated to release the space, but not unlink()ed, to avoid
>> reusing that relfilenode. The truncated file can be deleted after next
>> checkpoint.
>>
>> Now, how does checkpoint know what to delete? We can use the fsync
>> request mechanism for that. When a file is truncated, a new kind of
>> fsync request, a "deletion request", is sent to the bgwriter, which
>> collects all such requests to a list. Before checkpoint calculates new
>> RedoRecPtr, the list is swapped with an empty one, and after writing the
>> new checkpoint record, all the files that were in the list are deleted.
>>
>> We would leak empty files on crashes, but we leak files on crashes
>> anyway, so that shouldn't be an issue. This scheme wouldn't require
>> catalog changes, so it would be suitable for backpatching.
>>
>> Any better ideas?
> Couldn't we fix this by forcing a checkpoint before we commit the
> transaction that created the new pg_class entry for the clustered table?
> Or rather, more generally, before committing a transaction that created
> a new non-temporary relfilenode but didn't WAL-log any subsequent inserts.
Yes, that would work. As a small optimization, you could set a flag in
shared mem whenever you delete a rel file, and skip the checkpoint when
that flag isn't set.
> Thats of course a rather sledgehammer-like approach to this problem -
> but at least for the backbranched the fix would be less intrusive...
Too much of a sledgehammer IMHO.
BTW, CREATE INDEX is also vulnerable. And in 8.3, COPY to a table
created/truncated in the same transaction.
-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tommy Gildseth | 2007-10-17 14:11:27 | Re: DBLink's default user: postgres | 
| Previous Message | Heikki Linnakangas | 2007-10-17 14:02:17 | Re: Why copy_relation_data only use wal whenWALarchiving is enabled | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2007-10-17 15:47:34 | Re: BUG #3681: fillers are NULL in pgbench | 
| Previous Message | Volkan YAZICI | 2007-10-17 14:02:44 | Configurable Penalty Costs for Levenshtein |