Statistics for tsvector "wildcards". term*

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Statistics for tsvector "wildcards". term*
Date: 2010-05-29 12:07:53
Message-ID: 4C010399.8050806@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

There seems to be an "unimplemented" area around getting statistics for
wildcard searches done. Wildcards anchored to the left can be matched
up by the gin-index and the ts_match_vq operator:

testdb=# select to_tsvector('project') @@ to_tsquery('proj:*');
?column?
----------
t
(1 row)

Searching for project gives me this estimate:
testdb=# explain select id,document_tsvector from efam.reference where
document_tsvector @@ to_tsquery('projects') order by id limit 50;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Limit (cost=0.00..3008.54 rows=50 width=76)
-> Index Scan using reference_pkey on reference
(cost=0.00..3762544.72 rows=62531 width=76)
Filter: (document_tsvector @@ to_tsquery('projects'::text))
(3 rows)

whereas searching for proj:* gives:

testdb=# explain select id,document_tsvector from efam.reference where
document_tsvector @@ to_tsquery('proj:*') order by id limit 50;
QUERY PLAN
---------------------------------------------------------------------------------------------
Limit (cost=73.56..73.58 rows=6 width=76)
-> Sort (cost=73.56..73.58 rows=6 width=76)
Sort Key: id
-> Bitmap Heap Scan on reference (cost=34.55..73.49 rows=6
width=76)
Recheck Cond: (document_tsvector @@
to_tsquery('proj:*'::text))
-> Bitmap Index Scan on reference_fts_idx
(cost=0.00..34.54 rows=6 width=0)
Index Cond: (document_tsvector @@
to_tsquery('proj:*'::text))
(7 rows)

There are abouvios challenges in getting statistics for submatches where
there are no real information
in the pg_stats table, but there will also be a huge amount of cases
where a fairly reliable
guess can be extracted.

--
Jesper

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2010-05-29 13:56:57 Re: tsvector pg_stats seems quite a bit off.
Previous Message Jesper Krogh 2010-05-29 10:43:28 Re: tsvector pg_stats seems quite a bit off.