Re: Qual evaluation cost estimates for GIN indexes

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgreSQL(dot)org, Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
Subject: Re: Qual evaluation cost estimates for GIN indexes
Date: 2012-02-17 06:09:47
Message-ID: 4F3DEF2B.6040407@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

First, thanks for looking at this. Except from GIN indexes and
full-text-search being really good in our applications, this also
points to those excact places where it can be improved.

On 2012-02-17 00:15, Tom Lane wrote:
> I looked into the complaint here of poor estimation for GIN indexscans:
> http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php

I think this is the excact same issue:
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

> At first glance it sounds like a mistake in selectivity estimation,
> but it isn't: the rowcount estimates are pretty nearly dead on.
> The problem is in the planner's estimate of the cost of executing the
> @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but
> actually it's a good deal more expensive than that. Some
> experimentation suggests that @@ might be about 500 times as expensive
> as a simple integer comparison. I don't propose pushing its procost
> up that much, but surely at least 10 would be appropriate, maybe even
> 100.
>
> However ... if you just alter pg_proc.procost in Marc's example, the
> planner *still* picks a seqscan, even though its estimate of the seqscan
> cost surely does go up. The reason is that its estimate of the GIN
> indexscan cost goes up just as much, since we charge one qual eval cost
> per returned tuple in gincostestimate. It is easy to tell from the
> actual runtimes that that is not what's happening in a GIN indexscan;
> we are not re-executing the @@ operator for every tuple. But the
> planner's cost model doesn't know that.

There is something about lossy vs. non-lossy, if the index-result
is lossy, then it would "need" to execute the @@ operator
on each tuple and de-toast the toasted stuff and go all the way.

If it isn't then at least count() on a gin-index should be able to
utillize an index-only scan now?

I've had a significant amout of struggle over the years in this
corner and the patch that went in for gincostestimate brought
a huge set of problems to the ground, but not all.

Other related threads:
http://archives.postgresql.org/pgsql-performance/2010-05/msg00031.php
(ts_match_vq cost in discussion)
http://archives.postgresql.org/pgsql-performance/2010-05/msg00266.php

I dont think I have ever seen the actual run-time of any @@ query
to be faster going through the seq-scan than going through the index. Not
even if it is pulling near all the tuples out.

(test-case that tries to go in that corner).
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php

And I think is it due to a coulple of "real-world" things:
1) The tsvector-column is typically toasted.
2) The selected columns are typically in the main table.
3) The gin-index search + pulling main table is in
fact a measuable cheaper operation than pulling main+toast
uncompressing toast and applying ts_match_vq even in the most
favourable case for the seqscan.

Another real-world thing is that since the tsvector column is in toast
and isn't read when performing a bitmap-heap-scan, in addition
to the decompress-cost is it almost never hot in memory either,
causing its actuall runtime to be even worse.

Same problems hit a index-scan on another key where filtering
on a @@ operator, but I think I got around most of them by bumping
both cost of @@ and limit in the query to 10K instead of the 200 actually
wanted.

I do think I have been digging sufficiently in this corner and can
fairly easy test and craft test-examples that will demonstrate
the challenges. (a few is attached in above links).

Thanks for digging in this corner. Let me know if i can help, allthough
my actual coding skills are spare (at best).

--
Jesper

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-02-17 07:45:46 Re: MySQL search query is not executing in Postgres DB
Previous Message premanand 2012-02-17 05:33:37 MySQL search query is not executing in Postgres DB