Skip site navigation (1) Skip section navigation (2)

LIKE, leading percent, bind parameters and indexes

From: "Rodrigo Hjort" <rodrigo(dot)hjort(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Rodrigo Hjort" <rhjort(at)celepar(dot)pr(dot)gov(dot)br>
Subject: LIKE, leading percent, bind parameters and indexes
Date: 2006-05-23 23:11:17
Message-ID: 731083980605231611p5189ea98j762c76debf3006af@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
PG-Hackers,

I got the following picture:

detran=# \d sa_dut.tb_usuario
                    Table "sa_dut.tb_usuario"
         Column          |            Type             | Modifiers
-------------------------+-----------------------------+-----------
 numprocesso             | bigint                      | not null
 nome                    | character varying(44)       |
 nomemae                 | character varying(44)       |
 datanascimento          | date                        |
Indexes:
   "tb_usuario_pkey" PRIMARY KEY, btree (numprocesso)
   "ix_usuario_11" btree (nome varchar_pattern_ops, nomemae
varchar_pattern_ops)
   "ix_usuario_13" btree (datanascimento, nome varchar_pattern_ops)

As I do not use C locale, I created indexes based on "varchar_pattern_ops".
The issue I'm having is based on the following queries:

select * from TB_USUARIO where nome like 'TATIANA CRISTINA G%';
select * from TB_USUARIO where nome like '%TATIANA CRISTINA G%';

For some reasons, I'm not using text-search engines, like TSearch2, but only
the LIKE operator.
Here are the query plans involved:


detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO
processo0_, sa_dut.TB_USUARIO usuario1_ where (usuario1_.NOME like 'TATIANA
CRISTINA G%'  and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO);

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11.94..11.95 rows=1 width=0) (actual
time=143.970..143.972rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..11.94 rows=1 width=0) (actual time=
143.935..143.949 rows=1 loops=1)
        ->  Index Scan using ix_usuario_11 on tb_usuario usuario1_ (cost=
0.00..6.01 rows=1 width=8) (actual time=93.884..93.889 rows=1 loops=1)
              Index Cond: (((nome)::text ~>=~ 'TATIANA CRISTINA
G'::character varying) AND ((nome)::text ~<~ 'TATIANA CRISTINA H'::character
varying))
              Filter: ((nome)::text ~~ 'TATIANA CRISTINA G%'::text)
        ->  Index Scan using tb_processo_pkey on tb_processo processo0_
(cost=0.00..5.91 rows=1 width=8) (actual time=50.041..50.044 rows=1 loops=1)
              Index Cond: (processo0_.numprocesso = "outer".numprocesso)
 Total runtime: 144.176 ms

detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO
processo0_, sa_dut.TB_USUARIO usuario1_ where
(usuario1_.NOME like '%TATIANA CRISTINA G%'  and
processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=67534.55..67534.56 rows=1 width=0) (actual time=
8101.957..8101.959 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..67534.55 rows=1 width=0) (actual time=
5404.106..8101.923 rows=1 loops=1)
        ->  Seq Scan on tb_usuario usuario1_  (cost=0.00..67528.62 rows=1
width=8) (actual time=5404.056..8101.862 rows=1 loops=1)
              Filter: ((nome)::text ~~ '%TATIANA CRISTINA G%'::text)
        ->  Index Scan using tb_processo_pkey on tb_processo
processo0_  (cost=0.00..5.91 rows=1 width=8) (actual
time=0.034..0.037rows=1 loops=1)
              Index Cond: (processo0_.numprocesso = "outer".numprocesso)
 Total runtime: 8102.105 ms


We use Java, and recently we made an effort in order to avoid the leading
'%' on LIKE expressions.
The problem is that it wasn't solved, and then I made the following Java
code to verify it.

What happens is that only the "004" block uses the index! The "002" code,
which also has no leading percent, does a sequential scan. The difference
between them is that "002" uses bind parameters.

Is it concerned to the JDBC Driver or PostgreSQL itself? What could be done
in order to fix it?
I could use static parameters, but then the queries would have to be
reparsed each time on the backend, missing cache advantages.

****************************************************************************************************
package db;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class SelectLike {

   public SelectLike() {
       long qtd = 0L, inicio = 0L, tempo[] = {0,0,0,0};

       try {
           Class.forName("org.postgresql.Driver");
       } catch (ClassNotFoundException e) {
           e.printStackTrace();
       }

       Connection con = null;
       String dbURL = "jdbc:postgresql://10.15.61.6/database";
       try {
           con = DriverManager.getConnection(dbURL, "user", "password");

           String sql = "select count(*) as x0_0_ from
sa_dut.TB_PROCESSO processo0_, sa_dut.TB_USUARIO usuario1_ where
(usuario1_.NOME like ? and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO)";
           String nome = "TATIANA CRISTINA G";

           PreparedStatement ps = null;
           ResultSet rs = null;

           //001 - '%NAME%' binded
           if (ps != null) ps.close();
           ps = con.prepareStatement(sql);
           ps.setString(1, "%" + nome + "%");
           inicio = System.currentTimeMillis();
           rs = ps.executeQuery();
           rs.next();
           qtd = rs.getLong(1);
           rs.close();
           tempo[0] = System.currentTimeMillis() - inicio;

           //002 - 'NAME%' binded
           if (ps != null) ps.close();
           ps = con.prepareStatement(sql);
           ps.setString(1, nome + "%");
           inicio = System.currentTimeMillis();
           rs = ps.executeQuery();
           rs.next();
           qtd = rs.getLong(1);
           rs.close();
           tempo[1] = System.currentTimeMillis() - inicio;

           //003 - '%NAME%' static
           if (ps != null) ps.close();
           String sql1 = sql.replaceFirst("\\?", "'%" + nome + "%'");
           ps = con.prepareStatement(sql1);
           inicio = System.currentTimeMillis();
           rs = ps.executeQuery();
           rs.next();
           qtd = rs.getLong(1);
           rs.close();
           tempo[2] = System.currentTimeMillis() - inicio;

           //004 - 'NAME%' static
           if (ps != null) ps.close();
           String sql2 = sql.replaceFirst("\\?", "'" + nome + "%'");
           ps = con.prepareStatement(sql2);
           inicio = System.currentTimeMillis();
           rs = ps.executeQuery();
           rs.next();
           qtd = rs.getLong(1);
           rs.close();
           tempo[3] = System.currentTimeMillis() - inicio;

           ps.close();
           con.close();
       } catch (SQLException e) {
           e.printStackTrace();
       }

       System.out.println("QTD: " + qtd + "\n\n");
       for (int ii = 0; ii < tempo.length; ii++)
           System.out.println(ii + ": " + tempo[ii]);
   }

   public static void main(String[] args) {
       new SelectLike();
   }

}
****************************************************************************************************

-- 
Regards,

Rodrigo Hjort
http://icewall.org/~hjort

Responses

pgsql-hackers by date

Next:From: Josh BerkusDate: 2006-05-23 23:18:54
Subject: Re: SQL compliant interval implementation
Previous:From: Bruce MomjianDate: 2006-05-23 22:43:48
Subject: Re: SQL compliant interval implementation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group