LIKE predicate and ERROR: 42P22: could not determine which collation to use for string comparison - HINT: Use the COLLATE clause ...

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: LIKE predicate and ERROR: 42P22: could not determine which collation to use for string comparison - HINT: Use the COLLATE clause ...
Date: 2011-12-09 19:38:34
Message-ID: COL116-W333FEC5259FA21D7EF20F3A3B90@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


postgresql 9.1.2 running on linux debian squeeze on 32-bit x86

I have encountered behaviour relating to subject line which I'm fairly sure is a bug,  although I'm not sure exactly what.

My database was created with LC_LOCALE C,   and loaded using Pg-9.1.1.   it looks like

psql -d proteusdb -c "select datname, datdba, encoding, datcollate, datctype, datistemplate, datallowconn, datconnlimit, datlastsysoid, datfrozenxid, dattablespace from pg_database where datname = 'proteusdb'";
  datname  | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace
-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------
 proteusdb |     10 |        6 | C          | C        | f             | t            |           -1 |         11910 |          670 |          1663
(1 row)

 I have a table which as a column named discriminator :

CREATE TABLE entity (
    id bigint DEFAULT nextval('entity_id_seq'::regclass) NOT NULL,
    discriminator character varying(255) NOT NULL,
    version integer NOT NULL,
    inherit_right boolean,
    name text,
    parent_id bigint,
    association_id bigint,
    association2_id bigint,
    long1 bigint,
    long2 bigint
);

which shows up looking like this in the catalog

psql -d proteusdb -c "select N.nspname , C.relname, A.attrelid , A.attname , A.atttypid , T.typname , L.collname from pg_attribute A, pg_type T, pg_class C , pg_namespace N , pg_collation L where C.oid = A.attrelid and T.oid = A.atttypid and A.attcollation = L.oid and C.relnamespace = N.oid and N.nspname = 'public' and C.relname = 'entity' and A.attname = 'discriminator'";
 nspname | relname | attrelid |    attname    | atttypid | typname | collname
---------+---------+----------+---------------+----------+---------+----------
  public  | entity  |    25349 | discriminator |     1043 | varchar | default
(1 row)

and a query like so :

psql -d proteusdb -c "select count(*) from entity  as e1 where ( e1.association_id IN ( select id from entity as e2 where e2.parent_id=3043508)   OR e1.association_id=3043508 ) and e1.discriminator LIKE 'DEPLOY%'";
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

explain gives a clue what is going on :
psql -d proteusdb -c "explain select count(*) from entity  as e1 where ( e1.association_id IN ( select id from entity as e2 where e2.parent_id=3043508)   OR e1.association_id=3043508 ) and e1.discriminator LIKE 'DEPLOY%'";
                                                     QUERY PLAN                                                     
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=85045.70..85045.71 rows=1 width=0)
   ->  Index Scan using entity_id_association_id_discriminator on entity e1  (cost=523.55..85045.61 rows=36 width=0)
         Index Cond: (((discriminator)::text >= 'DEPLOY'::text) AND ((discriminator)::text < 'DEPLOZ'::text))
         Filter: (((discriminator)::text ~~ 'DEPLOY%'::text) AND ((hashed SubPlan 1) OR (association_id = 3043508)))
         SubPlan 1
           ->  Index Scan using entity_parent_id on entity e2  (cost=0.00..522.84 rows=281 width=8)
                 Index Cond: (parent_id = 3043508)
(7 rows)

So Pg has rewritten my LIKE pred to add the pair of range predicates so as to be able to exploit my index ... Good.
However  ---   It has then discovered that there is a mismatch between the collation of the column (default) and the constant (I assume C,  inherited from the database).

Well   --    I am not an expert on collation,   but I am fairly sure that the semantics of the LIKE predicate do not involve order,   so an error relating to ordering or collation should be impossible.     LIKE involves only exact (equality) matching of parts of the column value against the pattern.     If the query rewriter has satisified itself that it is a correct transformation of the query to add the two range predicates,   then it must  (should?)  have checked that any mismatch of collation does not affect correctness, and therefore no other part of postgres (planner,  runtime) should then object on those grounds.

So  - I *think* there is a  bug,   either that the query should not have been rewritten  (if collation does indeed make the rewrite incorrect),  or else it should have planned and executed the rewritten query.

I re-ran this on PG 9.1.2 to check and the same happens there as well.

Hoping that some expert can make a ruling on this.

John Lumby

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message John R Pierce 2011-12-09 20:17:25 Re: Postgres DB creation error WINXP pro
Previous Message Kevin Grittner 2011-12-09 18:49:06 Security definer "generated column" function used in index