Re: anti-join chosen even when slower than old plan

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: anti-join chosen even when slower than old plan
Date: 2010-11-11 08:01:12
Message-ID: AANLkTi=ND2Bm9nw_HcWFbU0o1L9ZGJTc4LLUrDzG1HoL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2010/11/11 Robert Haas <robertmhaas(at)gmail(dot)com>

>
> But thinking over what you've written here, I'm reminded of something
> Peter said years ago, also about the optimizer. He was discussed the
> ratio of the estimated cost to the actual cost and made an off-hand
> remark that efforts had been made over the years to make that ratio
> more consistent (i.e. improve the quality of the cost estimates) but
> that they'd been abandoned because they didn't necessarily produce
> better plans. Applying that line of thinking to this problem, maybe
> we should give up on trying to make the estimates truly model reality,
> and focus more on assigning them values which work well in practice.
> For example, in your case, it would be sufficient to estimate the
> amount of data that a given query is going to grovel through and then
> applying some heuristic to choose values for random_page_cost and
> seq_page_cost based on the ratio of that value to, I don't know,
> effective_cache_size.
>

As for me, the simplest solution would be to allow to set costs on
per-relation basis. E.g. I know that this relation is most time in memory
and other one (archive) is on the disk. This could work like charm along
with buffer pools (portions of shared cache) - tables (or indexes) that are
required to be cached can be assigned to bufferpool that has enough size to
hold all the data, archive ones - to small bufferpool. This can guarantie
that after query on the archive data, cached tables are still cached.
This solutions however, does not help on tables where only some portion of
table is activelly used. The solution can be to allow set costs via partial
indexes - e.g. "for any table access using this index, use this cost
values". This, BTW, will make table access via given index more preferable.

--
Best regards,
Vitalii Tymchyshyn

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message tv 2010-11-11 08:43:35 Re: Why dose the planner select one bad scan plan.
Previous Message 静安寺 2010-11-11 07:03:45 Re: Why dose the planner select one bad scan plan.