Re: Planner: rows=1 after "similar to" where condition.

From: "Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Planner: rows=1 after "similar to" where condition.
Date: 2008-02-25 16:08:13
Message-ID: E4953B65D9E5054AA6C227B410C56AA9C3B4@exchange1.joris2k.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Resent due to bounce...
orange.nl #5.0.0 X-SMTP-Server; host sss.pgh.pa.us[66.207.139.130] said:
550

>-----Original Message-----
>From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>Sent: Monday, 25 February 2008 16:34
>To: Joris Dobbelsteen
>Cc: Gregory Stark; Scott Marlowe; pgsql-general(at)postgresql(dot)org
>Subject: Re: [GENERAL] Planner: rows=1 after "similar to"
>where condition.
>
>"Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl> writes:
>> "Bitmap Heap Scan on log_syslog syslog (cost=11168.32..16988.84
>> rows=1
>> width=221) (actual time=11145.729..30067.606 rows=212 loops=1)"
>> " Recheck Cond: (((program)::text = 'amavis'::text) AND
>> ((facility)::text = 'mail'::text))"
>> " Filter: (((priority)::text = 'notice'::text) AND ((text)::text ~
>> '***:^(?:amavis\\[[0-9]+\\]: \\([0-9]+-[-0-9]+\\) Passed
>[A-Za-z0-9]+,
>> [][0-9\\.]* <[^<>]+> -> <[^<>]+>, Message-ID: <[^<>]+>,
>> (Resent-Message-ID: <[^<>]+>, |)mail.id: [^ ,]+, Hits: [-+0-9\\.,]+,
>> queued.as: [^ ,]+, [0-9]+ ms)$'::text))"
>
>It's not too surprising that you'd get a small selectivity
>estimate for such a long regexp; the default estimate is just
>based on the amount of fixed text in the pattern, and you've got a lot.
>
>If you increase the stats target for the column to 100 or more
>then it will try actually applying the regexp to all the
>histogram entries.
>That might or might not give you a better estimate.

I will try that, expect result back within a few days (have it collect
some better sample set). Unfortunally the regex is not so much for
narrowing down the selection, but rather guarenteeing the format of the
messages.
You seem to consider the common case differently, and I can agree for
most part. Unfortunally my use-case is different from the expected. That
said, might a less aggressive selectivity estimation for long strings
work better in the common case?

Might an alternative be to use a function and check for a positive
result, i.e. something that the predictor cannot take into account?

- Joris

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-02-25 16:08:57 Re: HOW does 8.3 CREATE REPLACe .. FUNCTION ..COST work
Previous Message Joris Dobbelsteen 2008-02-25 15:51:13 Re: Planner: rows=1 after "similar to" where condition.