Re: Full text search - query plan? PG 8.4.1

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Full text search - query plan? PG 8.4.1
Date: 2009-10-23 20:32:53
Message-ID: 4AE212F5.7080609@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>> Tom Lane wrote:
>>> ... There's something strange about your tsvector index. Maybe
>>> it's really huge because the documents are huge?
>
>> huge is a relative term, but length(ts_vector(body)) is about 200 for
>> each document. Is that huge?
>
> It's bigger than the toy example I was trying, but not *that* much
> bigger. I think maybe your index is bloated. Try dropping and
> recreating it and see if the estimates change any.

I'm a bit reluctant to dropping it and re-creating it. It'll take a
couple of days to regenerate, so this should hopefully not be an common
situation for the system.

I have set the statistics target to 1000 for the tsvector, the
documentation didn't specify any heavy negative sides of doing that and
since that I haven't seen row estimates that are orders of magnitude off.

It is build from scratch using inserts all the way to around 10m now,
should that result in index-bloat? Can I inspect the size of bloat
without rebuilding (or similar locking operation)?

The query still has a "wrong" tipping point between the two query-plans:

ftstest=# explain analyze select body from ftstest where
ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..7357.77 rows=100 width=738) (actual
time=3978.974..8595.086 rows=100 loops=1)
-> Index Scan using ftstest_id_pri_idx on ftstest
(cost=0.00..1436458.05 rows=19523 width=738) (actual
time=3978.971..8594.932 rows=100 loops=1)
Filter: (ftstest_body_fts @@ to_tsquery('testterm'::text))
Total runtime: 8595.222 ms
(4 rows)

ftstest=# set enable_indexscan=off;
SET
ftstest=# explain analyze select body from ftstest where
ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=59959.61..59959.86 rows=100 width=738) (actual
time=338.832..339.055 rows=100 loops=1)
-> Sort (cost=59959.61..60008.41 rows=19523 width=738) (actual
time=338.828..338.908 rows=100 loops=1)
Sort Key: id
Sort Method: top-N heapsort Memory: 32kB
-> Bitmap Heap Scan on ftstest (cost=22891.18..59213.45
rows=19523 width=738) (actual time=5.097..316.780 rows=19444 loops=1)
Recheck Cond: (ftstest_body_fts @@
to_tsquery('testterm'::text))
-> Bitmap Index Scan on ftstest_tfs_idx
(cost=0.00..22886.30 rows=19523 width=0) (actual time=4.259..4.259
rows=20004 loops=1)
Index Cond: (ftstest_body_fts @@
to_tsquery('testterm'::text))
Total runtime: 339.201 ms
(9 rows)

So for getting 100 rows where the term exists in 19.444 of 10.000.000
documents it chooses the index-scan where it (given random distribution
of the documents) should scan: 100*(10000000/19444) = 51429 documents.
So it somehow believes that the cost for the bitmap index scan is higher
than it actually is or the cost for the index-scan is lower than it
actually is.

Is is possible to manually set the cost for the @@ operator? It seems
natural that matching up a ts_vector to a ts_query, which is a much
heavier operation than = and even is stored in EXTENDED storage should
be much higher than a integer in plain storage.

I tried to search docs for operator cost, but I only found the overall
ones in the configuration file that are base values.

Jesper
--
Jesper

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-10-23 22:06:02 Re: Full text search - query plan? PG 8.4.1
Previous Message Jesper Krogh 2009-10-23 19:29:45 Re: Calculating selectivity for the query-planner on ts_vector colums.