Quick Links

New planner for like was -- Problem (bug?) with like

From:	David Walter <dwalter(at)ecs(dot)syr(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org, Lista PostgreSql <pgsql-general(at)postgresql(dot)org>
Subject:	New planner for like was -- Problem (bug?) with like
Date:	2001-12-04 16:10:30
Message-ID:	Pine.SOL.4.10.10112041105540.6144-100000@boreas.ecs.syr.edu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

On Tue, 4 Dec 2001, Tom Lane wrote:

> bombadil(at)wanadoo(dot)es writes:
> > 1) # explain SELECT * from v_cliente where nombre like '%DA%';
>
> > Merge Join (cost=54763.50..62874.36 rows=413980 width=183)
> > -> Sort (cost=16238.44..16238.44 rows=54 width=131)

> The problem here is that the planner is being way too optimistic about
> the selectivity of LIKE '%DAVID%' --- notice the estimate that only
> one matching row will be found in cliente, rather than 54 as with '%DA%'.
> So it chooses a plan that avoids the sort overhead needed for an
> efficient merge join with the other tables. That would be a win if
> there were only one matching row, but as soon as there are lots, it's
> a big loss, because the subquery to join the other tables gets redone
> for every matching row :-(
>
> >> Also, how many rows are there really that match '%DA%' and '%DAVID%'?
>
> > 1) 2672 rows -> 3.59 sec.
> > 2) 257 rows -> 364.69 sec.
>
> I am thinking that the rules for selectivity of LIKE patterns probably
> need to be modified. Presently the code assumes that a long constant
> string has probability of occurrence proportional to the product of the
> probabilities of the individual letters. That might be true in a random
> world, but people don't search for random strings. I think we need to
> back off the selectivity estimate by some large factor to account for
> the fact that the pattern being searched for is probably not random.
> Anyone have ideas how to do that?
>

Is there any statistic being kept by partial index?

for instance

#occurrences of A
B

A fairly small table/index could track these couldn't it?

If it were a btree itself, then statistics could be split appropriately
into sub-branches when the # of occurrences exceeds some threshold.

david

In response to

Re: Problem (bug?) with like at 2001-12-04 15:21:18 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Barry Lind	2001-12-04 16:44:50	Re: java stored procedures
Previous Message	Serkan BEKTAŞ	2001-12-04 16:07:28	FW: Localization - Turkish - Tr

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2001-12-04 16:12:55	Re: FW: [CYGWIN] 7.2b3 postmaster doesn't start on Win98
Previous Message	Tom Lane	2001-12-04 15:21:18	Re: Problem (bug?) with like