Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date: 2008-01-28 23:13:18
Message-ID: 479E618E.5050309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
>> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>>> Rather than having a boolean GUC, we should have a number and make the
>>> parameter "synchronised_scan_threshold".
>> This would open up a can of worms I'd prefer not to touch, having to do
>> with whether the buffer-access-strategy behavior should track that or
>> not. As the note in heapam.c says,
>>
>> * If the table is large relative to NBuffers, use a bulk-read access
>> * strategy and enable synchronized scanning (see syncscan.c). Although
>> * the thresholds for these features could be different, we make them the
>> * same so that there are only two behaviors to tune rather than four.
>>
>> It's a bit late in the cycle to be revisiting that choice. Now we do
>> already have three behaviors to worry about (BAS on and syncscan off)
>> but throwing in a randomly settable knob will take it back to four,
>> and we have no idea how that fourth case will behave. The other tack we
>> could take (having the one GUC variable control both thresholds) is
>> not good since it will result in pg_dump trashing the buffer cache.
>
> OK, good points.
>
> I'm still concerned that the thresholds gets higher as we increase
> shared_buffers. We may be removing performance features as fast as we
> gain performance when we set shared_buffers higher.
>
> Might we agree that the threshold should be fixed at 8MB, rather than
> varying upwards as we try to tune?

Synchronized scans, and the bulk-read strategy, don't help if the table
fits in cache. If it fits in shared buffers, you're better off keeping
it there, than swap pages between the OS cache and shared buffers, or
spend any effort synchronizing scans. That's why we agreed back then
that the threshold should be X% of shared_buffers.

It's a good point that we don't want pg_dump to screw up the cluster
order, but that's the only use case I've seen this far for disabling
sync scans. Even that wouldn't matter much if our estimate for
"clusteredness" didn't get screwed up by a table that looks like this:
"5 6 7 8 9 1 2 3 4"

Now, maybe there's more use cases where you'd want to tune the
threshold, but I'd like to see some before we add more knobs.

To benefit from a lower threshold, you'd need to have a table large
enough that its cache footprint matters, but is still smaller than 25%
of shared_buffers, and have seq scans on it. In that scenario, you might
benefit from a lower threshold, because that would leave some
shared_buffers free for other use. Even that is quite hand-wavey; the
buffer cache LRU algorithm handles that kind of scenarios reasonably
well already, and whether or not

To benefit from a larger threshold, you'd need to have a table larger
than 25% of shared_buffers, but still smaller than shared_buffers, and
seq scan it often enough that you want to keep it in shared buffers. If
you're frequently seq scanning a table of that size, you're most likely
suffering from a bad plan. Even then, the performance difference
shouldn't be that great, the table surely fits in OS cache anyway, with
typical shared_buffers settings.

Tables that are seq scanned are typically very small, like a summary
table with just a few rows, or huge tables in a data warehousing
system. Between the extremes, I don't think the threshold actually has a
very big impact.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2008-01-28 23:14:05 Re: [PATCHES] Better default_statistics_target
Previous Message Jeff Davis 2008-01-28 22:58:13 Re: Proposed patch: synchronized_scanning GUC variable

Browse pgsql-patches by date

  From Date Subject
Next Message Christopher Browne 2008-01-28 23:14:05 Re: [PATCHES] Better default_statistics_target
Previous Message Jeff Davis 2008-01-28 22:58:13 Re: Proposed patch: synchronized_scanning GUC variable