Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop())

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop())
Date: 2010-04-19 02:39:47
Message-ID: g2t603c8f071004181939h23f75b52u41a10297b36ef942@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 12, 2009 at 6:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sep 6, 2009, at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... But now that we have a plan for a less obviously broken costing
>>> approach, maybe we should open the floodgates and allow
>>> materialization
>>> to be considered for any inner path that doesn't materialize itself
>>> already
>
>> Maybe.  I think some experimentation will be required.  We also have
>> to be aware of effects on planning time; match_unsorted_outer() is,
>> AIR, a significant part of the CPU cost of planning large join problems.
>
> I've committed some changes pursuant to this discussion.  It may be that
> match_unsorted_outer gets a bit slower, but I'm not too worried about
> that.  My experience is that the code that tries different mergejoin
> options eats way more cycles than the nestloop code does.

One problem with the current implementation of cost_rescan() is that
it ignores caching effects. It seems to be faster to rescan a
materialize node than it is to rescan a seqscan of a table, even if
there are no restriction clauses, presumably because you get to skip
tuple visibility checks and maybe some other overhead, too. But
cost_rescan() thinks that rescanning the table will require rereading
the whole thing from disk, which isn't right either - it probably
ought to factor in effective_cache_size much as the estimates for
iterated index scans do. I'm not sure how many real problems this is
going to create.

Another potential problem is that materializing a whole-table seqscan
to avoid repeating the tuple visibility checks may be a win in some
strict sense, but there are externalities: it's also going to use a
lot more memory/disk than just rescanning the table.

...Robert

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Sontakke 2010-04-19 06:32:57 CTAS not honoring NOT NULL, DEFAULT modifiers
Previous Message Robert Haas 2010-04-19 02:04:50 Re: master in standby mode croaks