Re: reducing random_page_cost from 4 to 2 to force index scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Date: 2011-05-24 02:44:26
Message-ID: BANLkTin2seetry0y0Nked0FMqwq-yDOGFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, May 23, 2011 at 3:08 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> Well, all of that stuff sounds impractically expensive to me... but I
>> just work here.
>
> I'll point out that the simple version, which just checks for hot tables
> and indexes, would improve estimates greatly and be a LOT less
> complicated than these proposals.

I realize I'm sounding like a broken record here, but as far as I can
tell there is absolutely zero evidence that that would be better. I'm
sure you're in good company thinking so, but the list of things that
could skew (or should I say, screw) the estimates is long and painful;
and if those estimates are wrong, you'll end up with something that is
both worse and less predictable than the status quo. First, I haven't
seen a shred of hard evidence that the contents of the buffer cache or
OS cache are stable enough to be relied upon, and we've repeatedly
discussed workloads where that might not be true. Has anyone done a
systematic study of this on a variety real production systems? If so,
the results haven't been posted here, at least not that I can recall.
Second, even if we were willing to accept that we could obtain
relatively stable and accurate measurements of this data, who is to
say that basing plans on it would actually result in an improvement in
plan quality? That may seem obvious, but I don't think it is. The
proposed method is a bit like trying to determine the altitude of a
hot air balloon by throwing the ballast over the side and timing how
long it takes to hit the ground. Executing plans that are based on
the contents of the cache will change the contents of the cache, which
will in turn change the plans. The idea that we can know, without any
experimentation, how that's going to shake out, seems to me to be an
exercise in unjustified optimism of the first order.

Sorry to sound grumpy and pessimistic, but I really think we're
letting our enthusiasm get way, way ahead of the evidence.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message panam 2011-05-24 04:14:24 Hash Anti Join performance degradation
Previous Message Craig Ringer 2011-05-23 23:26:46 Re: postgres not use index, IN statement