Re: Yet another abort-early plan disaster on 9.3

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-10 16:53:51
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10.10.2014 16:21, Craig James wrote:
> On Fri, Oct 10, 2014 at 5:10 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
> <mailto:tv(at)fuzzy(dot)cz>> wrote:
> > I've gone looking for papers on this topic but from what I read this
> > isn't so. To get any noticeable improvement you need to read 10-50% of
> > the table and that's effectively the same as reading the entire table
> > -- and it still had pretty poor results. All the research I could find
> > went into how to analyze the whole table while using a reasonable
> > amount of scratch space and how to do it incrementally.
> I think it's really difficult to discuss the estimation without some
> basic
> agreement on what are the goals. Naturally, we can't get a perfect
> estimator with small samples (especially when the sample size is
> fixed and
> not scaling with the table). But maybe we can improve the estimates
> without scanning most of the table?
> FWIW I've been playing with the adaptive estimator described in [1] and
> the results looks really interesting, IMHO. So far I was testing it on
> synthetic datasets outside the database, but I plan to use it instead of
> our estimator, and do some more tests.
> We've solved this problem using an external (non-Postgres) dynamically
> optimizing index. In addition to the "early abort," we also require an
> efficient "late start", the equivalent of "offset 100 limit 10". It's a
> common problem for web sites that let users page through data with just
> a tiny amount of state information (a cookie).

Yeah, paging is a known example, both for the inefficiency once you get
to pages far away, and because of the planning challenges. I think there
are known solutions to this problem
although those are not applicable to all cases.

But I'm not sure how that's related to the ndistinct estimation problem,
discussed in this thread (or rather in this subthread)?

> Our index is for chemical structures. Chemicals are indexed on
> chemical fragments
> <>. A search
> typically starts with 50-200 indexed "columns" (chemical fragments).
> The query is always flat, "A and B and ... and Z". The indexed
> fragments are both correlated (the existence of one strongly raises
> the chances of another) and anti-correlated (certain combinations are
> very rare).

Maybe I don't understand the problem well enough, but isn't this a
perfect match for GIN indexes? I mean, you essentially need to do
queries like "WHERE substance @@ ('A & B & !C')" etc. Which is exactly
what GIN does, because it keeps pointers to tuples for each fragment.

> Static planners simply can't handle the "early abort" condition,
> even with good statistics. Many have pointed out that data are
> "lumpy" rather than well distributed. A more subtle problem is that
> you can have evenly distributed data, but badly distributed
> correlations. "Agnes" and "Bob" may be names that are distributed
> well in a real-estate database, but it might happen that all of the
> information about homes whose owners' names are "Agnes" and "Bob"
> occurs at the very end of all of your data because they just got
> married and bought a house.
> The end result is that even with perfect statistics on each column,
> you're still screwed. The combinatorial explosion of possible
> correlations between indexes is intractable.

Static planners clearly have limitations, but we don't have dynamic
planning in PostgreSQL, so we have to live with them. And if we could
improve the quality of estimates - lowering the probability of poorly
performing plans, it's probably good to do that.

It won't be perfect, but until we have dynamic planning it's better than


In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2014-10-10 17:43:38 Re: schema-only -n option in pg_restore fails
Previous Message Andres Freund 2014-10-10 16:00:20 Re: Wait free LW_SHARED acquisition - v0.9

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2014-10-10 17:53:36 Re: Yet another abort-early plan disaster on 9.3
Previous Message Craig James 2014-10-10 14:21:05 Re: Yet another abort-early plan disaster on 9.3