Re: [PERFORM] DELETE vs TRUNCATE explanation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Harold A(dot) Giménez <harold(dot)gimenez(at)gmail(dot)com>
Subject: Re: [PERFORM] DELETE vs TRUNCATE explanation
Date: 2012-07-19 14:09:26
Message-ID: 20408.1342706966@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Seems a bit complex, but it might be worth it. Keep in mind that I
> eventually want to be able to make an unlogged table logged or a visca
> versa, which will probably entail unlinking just the init fork (for
> the logged -> unlogged direction).

Well, as far as that goes, I don't see a reason why you couldn't unlink
the init fork immediately on commit. The checkpointer should not have
to be involved at all --- there's no reason to send it a FORGET FSYNC
request either, because there shouldn't be any outstanding writes
against an init fork, no?

But having said that, this does serve as an example that we might
someday want the flexibility to kill individual forks. I was
intending to kill smgrdounlinkfork altogether, but I'll refrain.

> I think this is just over-engineered. The originally complained-of
> problem was all about the inefficiency of manipulating the
> checkpointer's backend-private data structures, right? I don't see
> any particular need to mess with the shared memory data structures at
> all. If you wanted to add some de-duping logic to retail fsync
> requests, you could probably accomplish that more cheaply by having
> each such request look at the last half-dozen or so items in the queue
> and skip inserting the new request if any of them match the new
> request. But I think that'd probably be a net loss, because it would
> mean holding the lock for longer.

What about checking just the immediately previous entry? This would
at least fix the problem for bulk-load situations, and the cost ought
to be about negligible compared to acquiring the LWLock.

I have also been wondering about de-duping on the backend side, but
the problem is that if a backend remembers its last few requests,
it doesn't know when that cache has to be cleared because of a new
checkpoint cycle starting. We could advertise the current cycle
number in shared memory, but you'd still need to take a lock to
read it. (If we had memory fence primitives it could be a bit
cheaper, but I dunno how much.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-07-19 14:12:19 Re: bgwriter, regression tests, and default shared_buffers settings
Previous Message Andrew Dunstan 2012-07-19 13:54:09 Re: isolation check takes a long time

Browse pgsql-performance by date

  From Date Subject
Next Message Felix Scheicher 2012-07-19 15:13:14 Re: queries are fast after dump->restore but slow again after some days dispite vacuum
Previous Message Robert Haas 2012-07-19 12:56:51 Re: [PERFORM] DELETE vs TRUNCATE explanation