Minimum selectivity estimate for LIKE 'prefix%'

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Minimum selectivity estimate for LIKE 'prefix%'
Date: 2008-03-06 21:16:54
Message-ID: 23548.1204838214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Peter complained awhile back about unrealistically small selectivity
estimates for LIKE with a fixed-prefix pattern:
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00939.php
I had suspected that this was related to the locale-specific problems
we fixed a few months ago, but having finally had a chance to look
at the data off-list, there doesn't seem to be any such component
to the problem. What it boils down to is that when we generate a
range constraint based on the prefix, such as
col >= 'prefix' AND col < 'prefiy'
if the prefix is more than a few characters long then the two endpoint
values are indistinguishable as far as comparison to the histogram
is concerned, and so we come out with a selectivity estimate that
is zero to within roundoff error. This is unreasonably optimistic
and can lead to bad plan choices.

What I propose doing about this is a small variant on Peter's original
suggestion: compute the estimated selectivity for
col = 'prefix'
and clamp the result of prefix_selectivity to be at least that.
This is plausible on intuitive grounds since the range constraint
must surely include at least these values. Furthermore, it eliminates
what had been an entirely ad-hoc choice of a lower bound (the code
was clamping to at least 1e-10, which is surely unreasonably
optimistic).

The end result of this, for the case Peter is interested in where there
are no especially common values, is that the minimum selectivity
estimate for LIKE 'prefix%' will be essentially 1/ndistinct where
ndistinct is the estimated number of distinct values in the column,
because that's what eqsel() does in such cases.

I attach a proposed patch against HEAD for this. It's a bit long
but most of the bulk is refactoring eqsel() to make it easy to use from
prefix_selectivity(). The intellectual content is just in the last
diff hunk.

To help out Peter's client, who's running 8.1.x, we'd have to backpatch
at least that far. We backpatched the last round of LIKE selectivity
fixes to 8.1, so I don't have too much hesitation about doing the same
here.

Comments?

regards, tom lane

Attachment Content-Type Size
like_minimum_estimate.patch.gz application/octet-stream 3.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-03-06 21:45:21 Re: BUG #3774: create table like including index doesn't update pg_constraints with primary key
Previous Message Decibel! 2008-03-06 20:57:55 Re: dblink doesn't honor interrupts while waiting a result

Browse pgsql-patches by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2008-03-06 23:19:35 Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Previous Message Andrew Dunstan 2008-03-06 20:44:47 Re: CopyReadLineText optimization