Re: TB-sized databases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: TB-sized databases
Date: 2007-12-06 18:34:47
Message-ID: 18104.1196966087@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Matthew <matthew(at)flymine(dot)org> writes:
> On Thu, 6 Dec 2007, Tom Lane wrote:
>> Hmm. IIRC, there are smarts in there about whether a mergejoin can
>> terminate early because of disparate ranges of the two join variables.

> Very cool. Would that be a planner cost estimate fix (so it avoids the
> merge join), or a query execution fix (so it does the merge join on the
> table subset)?

Cost estimate fix. Basically what I'm thinking is that the startup cost
attributed to a mergejoin ought to account for any rows that have to be
skipped over before we reach the first join pair. In general this is
hard to estimate, but for mergejoin it can be estimated using the same
type of logic we already use at the other end.

After looking at the code a bit, I'm realizing that there's actually a
bug in there as of 8.3: mergejoinscansel() is expected to be able to
derive numbers for either direction of scan, but if it's asked to
compute numbers for a DESC-order scan, it looks for a pg_stats entry
sorted with '>', which isn't gonna be there. It needs to know to
look for an '<' histogram and switch the min/max. So the lack of
symmetry here is causing an actual bug in logic that already exists.
That makes the case for fixing this now a bit stronger ...

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2007-12-06 19:50:33 Re: TB-sized databases
Previous Message Matthew 2007-12-06 18:03:09 Re: TB-sized databases