Re: Removing PD_ALL_VISIBLE

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing PD_ALL_VISIBLE
Date: 2013-01-17 22:50:25
Message-ID: 1358463025.3372.320.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2013-01-17 at 09:53 -0500, Robert Haas wrote:
> The main question in my mind is whether
> there are any negative consequences to holding a VM buffer pin for
> that long without interruption. The usual consideration - namely,
> blocking vacuum - doesn't apply here because vacuum does not take a
> cleanup lock on the visibility map page, only the heap page, but I'm
> not sure if there are others.

If the "without interruption" part becomes a practical problem, it seems
fairly easy to fix: drop the pin and pick it up again once every K
pages. Unless I'm missing something, this is a minor concern.

> The other part of the issue is cache pressure. I don't think I can say
> it better than what Tom already wrote:
>
> # I'd be worried about the case of a lot of sessions touching a lot of
> # unrelated tables. This change implies doubling the number of buffers
> # that are held pinned by any given query, and the distributed overhead
> # from that (eg, adding cycles to searches for free buffers) is what you
> # ought to be afraid of.
>
> I agree that we ought to be afraid of that.

It's a legitimate concern, but I think being "afraid" goes to far (more
below).

> A pgbench test isn't
> going to find a problem in this area because there you have a bunch of
> sessions all accessing the same small group of tables. To find a
> problem of the type above, you'd need lots of backends accessing lots
> of different, small tables. That's not the use case we usually
> benchmark, but I think there are a fair number of people doing things
> like that - for example, imagine a hosting provider or web application
> with many databases or schemas on a single instance. AFAICS, Jeff
> hasn't tried to test this scenario.

The concern here is over a lot of different, small tables (e.g.
multi-tenancy or something similar) as you say. If we're talking about
nearly empty tables, that's easy to fix: just don't use the VM on tables
less than N pages.

You could say that "small" tables are really 1-10MB each, and you could
have a zillion of those. I will try to create a worst-case here and see
what numbers come out. Perhaps the extra time to look for a free buffer
does add up.

Test plan:

1. Take current patch (without "skip VM check for small tables"
optimization mentioned above).
2. Create 500 tables each about 1MB.
3. VACUUM them all.
4. Start 500 connections (one for each table)
5. Time the running of a loop that executes a COUNT(*) on that
connection's table 100 times.

I think shared_buffers=64MB is probably appropriate. We want some memory
pressure so that it has to find and evict pages to satisfy the queries.
But we don't want it to be totally exhausted and unable to even pin a
new page; that really doesn't tell us much except that max_connections
is too high.

Sound reasonable?

> Now, on the flip side, we should also be thinking about what we would
> gain from this patch, and what other ways there might be to achieve
> the same goals.

One thing to keep in mind is that the current code to maintain a
crash-safe PD_ALL_VISIBLE and VM bit is quite complex and doesn't play
by the normal rules. If you want to talk about distributed costs, that
has some very real distributed costs in terms of development effort. For
instance, my checksums patch took me extra time to write (despite the
patch being the simplest checksums design on the table) and will take
others extra time to review.

So, if the only things keeping that code in place are theoretical fears,
let's take them one by one and see if they are real problems or not.

> All of which is to say that I think this patch is premature. If we
> adopt something like this, we're likely never going to revert back to
> the way we do it now, and whatever cache-pressure or other costs this
> approach carries will be hear to stay - so we had better think awfully
> carefully before we do that.

What about this patch makes it hard to undo/rework in the future?

> Even if
> there were, this is exactly the sort of thing that should be committed
> at the beginning of a release cycle, not the end, so as to allow
> adequate time for discovery of unforeseen consequences before the code
> ends up out in the wild.

I'm concerned that such a comment at this stage will cut review early,
which could prevent also it from being committed early in 9.4.

> Of course, there's another issue here too, which is that as Pavan
> points out, the page throws crash-safety out the window

My mistake. I believe that is already fixed, and certainly not a
fundamental issue.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-01-17 22:56:16 Re: HS locking broken in HEAD
Previous Message Dimitri Fontaine 2013-01-17 22:48:23 Re: Event Triggers: adding information