Re: [PERFORM] DELETE vs TRUNCATE explanation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Harold A(dot) Giménez <harold(dot)gimenez(at)gmail(dot)com>
Subject: Re: [PERFORM] DELETE vs TRUNCATE explanation
Date: 2012-07-19 16:17:12
Message-ID: CA+TgmoagwPYG8QO3ykccp4_dpYy_Y_KKzF4QQBt_TPaFKXV9pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Jul 19, 2012 at 10:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Seems a bit complex, but it might be worth it. Keep in mind that I
>> eventually want to be able to make an unlogged table logged or a visca
>> versa, which will probably entail unlinking just the init fork (for
>> the logged -> unlogged direction).
>
> Well, as far as that goes, I don't see a reason why you couldn't unlink
> the init fork immediately on commit. The checkpointer should not have
> to be involved at all --- there's no reason to send it a FORGET FSYNC
> request either, because there shouldn't be any outstanding writes
> against an init fork, no?

Well, it gets written when it gets created. Some of those writes go
through shared_buffers.

> But having said that, this does serve as an example that we might
> someday want the flexibility to kill individual forks. I was
> intending to kill smgrdounlinkfork altogether, but I'll refrain.

If you want to remove it, it's OK with me. We can always put it back
later if it's needed. We have an SCM that allows us to revert
patches. :-)

> What about checking just the immediately previous entry? This would
> at least fix the problem for bulk-load situations, and the cost ought
> to be about negligible compared to acquiring the LWLock.

Well, two things:

1. If a single bulk load is the ONLY activity on the system, or more
generally if only one segment in the system is being heavily written,
then that would reduce the number of entries that get added to the
queue, but if you're doing two bulk loads on different tables at the
same time, then it might not do much. From Greg Smith's previous
comments on this topic, I understand that having two or three entries
alternating in the queue is a fairly common pattern.

2. You say "fix the problem" but I'm not exactly clear what problem
you think this fixes. It's true that the compaction code is a lot
slower than an ordinary queue insertion, but I think it generally
doesn't happen enough to matter, and when it does happen the system is
generally I/O bound anyway, so who cares? One possible argument in
favor of doing something along these lines is that it would reduce the
amount of data that the checkpointer would have to copy while holding
the lock, thus causing less disruption for other processes trying to
insert into the request queue. But I don't know whether that effect
is significant enough to matter.

> I have also been wondering about de-duping on the backend side, but
> the problem is that if a backend remembers its last few requests,
> it doesn't know when that cache has to be cleared because of a new
> checkpoint cycle starting. We could advertise the current cycle
> number in shared memory, but you'd still need to take a lock to
> read it. (If we had memory fence primitives it could be a bit
> cheaper, but I dunno how much.)

Well, we do have those, as of 9.2. There not being used for anything
yet, but I've been looking for an opportunity to put them into use.
sinvaladt.c's msgnumLock is an obvious candidate, but the 9.2 changes
to reduce the impact of sinval synchronization work sufficiently well
that I haven't been motivated to tinker with it any further. Maybe it
would be worth doing just to exercise that code, though.

Or, maybe we can use them here. But after some thought I can't see
exactly how we'd do it. Memory barriers prevent a value from being
prefetched too early or written back to main memory too late, relative
to other memory operations by the same process, but the definition of
"too early" and "too late" is not quite clear to me here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-07-19 16:43:23 Re: Using pg_upgrade on log-shipping standby servers
Previous Message Bruce Momjian 2012-07-19 16:02:58 Re: Using pg_upgrade on log-shipping standby servers

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2012-07-19 17:04:41 Re: queries are fast after dump->restore but slow again after some days dispite vacuum
Previous Message Jeff Janes 2012-07-19 16:12:08 Re: queries are fast after dump->restore but slow again after some days dispite vacuum