Re: Accounting for toast in query planner. (gin/gist indexes).

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Accounting for toast in query planner. (gin/gist indexes).
Date: 2011-12-01 03:29:03
Message-ID: 7265.1322710143@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jesper Krogh <jesper(at)krogh(dot)cc> writes:
> I have currently hit a problem which I dug into finding the cause for, in
> particular, searching in GIN indices seems in some situations to
> un-fairly favor Sequential Scans.

> Googling a bit I found this page:
> http://postgis.refractions.net/docs/ch06.html#id2635817
> Describing the excact problem.

> It seemed to be discussed back in the pre 8.1 days and wasn't
> solved there, is there a chance someone may address it in 9.2 ?
> http://archives.postgresql.org/pgsql-performance/2005-02/msg00041.php

Don't hold your breath. There's a huge amount of work to do there,
and nobody's even thinking about it. The problem has actually gotten
worse since 8.1, because the index access patterns are more complex
and the planner has less not more info about them (because we pushed
the determination of what's "lossy" to run time). Accumulating stats
about how toasty the heap tuples are would be only the first step
towards producing better cost estimates here --- you'd also need some
way of estimating how lossy an index is, for instance, so you could
guess how many heap tuples will be visited. And on top of that,
for a lot of these complicated operators even our estimates of the final
number of matches are pretty crummy. I'm not sure if having an explicit
model of what's happening inside the index would help. Maybe it would,
since the condition actually being tested by the index is probably
simpler than the operator proper, but right now the planner has no clue
that they're different.

> Would you coin it a hard task or can a "fairly" naive C-coder, with
> a fair amount of PG experience approach it?

We're a long way from needing C coding here. The first thing would be
to come up with a design for what we're going to model, what statistical
estimates have to be made along the way, and how we can compartmentalize
the knowledge needed given that we want operator classes and index
access methods to be pluggable.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-12-01 05:47:40 Re: Word-smithing doc changes
Previous Message Kris Jurka 2011-12-01 00:55:13 Re: Java LISTEN/NOTIFY client library work-around