Re: anti-join chosen even when slower than old plan

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "Mladen Gogala" <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject: Re: anti-join chosen even when slower than old plan
Date: 2010-11-11 19:15:55
Message-ID: 4CDBEC8B02000025000375CF@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> BTW ... on reflection it seems that this would *not* solve the
> use-case Kevin described at the start of this thread. What he's
> got AIUI is some large tables whose recent entries are well-
> cached, and a lot of queries that tend to hit that well-cached
> portion, plus a few queries that hit the whole table and so see
> largely-not-cached behavior. We can't represent that very well
> with a caching knob at the table level. Either a high or a low
> setting will be wrong for one set of queries or the other.

Exactly right.

> The most practical solution for his case still seems to be to
> twiddle some GUC or other locally in the maintenance scripts that
> do the full-table-scan queries.

Yes, that works fine. The thread spun off in this speculative
direction because I started thinking about whether there was any
reasonable way for PostgreSQL to automatically handle such things
without someone having to notice the problem and do the per-script
tuning. I don't know whether any of the ideas thus spawned are
worth the effort -- it's not a situation I find myself in all that
often. I guess it could be considered an "ease of use" feature.

> Unfortunately we don't have an equivalent of per-session SET (much
> less SET LOCAL) for per-relation attributes. Not sure if we want
> to go there.

Besides the "fully-scanned object size relative to relation size
costing adjustment" idea, the only one which seemed to be likely to
be useful for this sort of issue was the "costing factors by user
ID" idea -- the interactive queries hitting the well-cached portion
of the tables are run through a read-only user ID, while the weekly
maintenance scripts (obviously) are not. With the settings I
initially had assigned to the cluster the maintenance scripts would
never have seen this issue; it was tuning to resolve end-user
complaints of slowness in the interactive queries which set up the
conditions for failure, and if I'd had per-user settings, I probably
would have (and definitely *should* have) used them.

FWIW, I can certainly see the potential of some other ideas which
came up on the thread; what might have seemed like antipathy toward
them was more of an attempt to point out that they would not have
helped at all with the problem which started this thread.

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-11-11 19:17:00 Re: anti-join chosen even when slower than old plan
Previous Message Robert Haas 2010-11-11 19:05:41 Re: anti-join chosen even when slower than old plan