Re: LIKE op with B-Tree Index?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Sam Wong <sam(at)hellosam(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: LIKE op with B-Tree Index?
Date: 2012-10-16 20:29:44
Message-ID: CAHyXU0xa2uodxLDqqc8KnA=vW_G8P9wDpOHeeMp55psYr_J7PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong <sam(at)hellosam(dot)net> wrote:
> Hi communities,
>
> I am investigating a performance issue involved with LIKE 'xxxx%' on an
> index in a complex query with joins.
>
> The problem boils down into this simple scenario---:
> ====Scenario====
> My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9.
> 2.1.
>
> Q1.
> SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
>
> Q2.
> SELECT * FROM shipments WHERE shipment_id >= '12345678' AND shipment_id <
> '12345679'
>
> shipments is a table with million rows and 20 columns. Shipment_id is the
> primary key with text and non-null field.
>
> CREATE TABLE cod.shipments
> (
> shipment_id text NOT NULL,
> -- other columns omitted
> CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id)
> )
>
> Analyze Q1 gives this:
> Index Scan using shipments_pkey on shipments (cost=0.00..39.84 rows=1450
> width=294) (actual time=0.018..0.018 rows=1 loops=1)
> Index Cond: ((shipment_id >= '12345678'::text) AND (shipment_id <
> '12345679'::text))
> Filter: (shipment_id ~~ '12345678%'::text)
> Buffers: shared hit=4
>
> Analyze Q2 gives this:
> Index Scan using shipments_pkey on shipments (cost=0.00..39.83 rows=1
> width=294) (actual time=0.027..0.027 rows=1 loops=1)
> Index Cond: ((shipment_id >= '12345678'::text) AND (shipment_id <
> '12345679'::text))
> Buffers: shared hit=4
>
> ====Problem Description====
> In Q1, the planner thought there will be 1450 rows, and Q2 gave a much
> better estimate of 1.
> The problem is when I combine such condition with a join to other table,
> postgres will prefer a merge join (or hash) rather than a nested loop.
>
> ====Question====
> Is Q1 and Q2 equivalent? From what I see and the result they seems to be the
> same, or did I miss something? (Charset: C, Encoding: UTF-8)
> If they are equivalent, is that a bug of the planner?

They are most certainly not equivalent. What if the shipping_id is 12345678Z?

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Marti Raudsepp 2012-10-16 22:05:39 Re: limit order by performance issue
Previous Message Shaun Thomas 2012-10-16 19:28:15 Re: limit order by performance issue