select with "like" from another table

From: "Anton Maksimenkov" <anton200(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: select with "like" from another table
Date: 2006-05-29 09:53:24
Message-ID: 8cac8dd0605290253v75d43b9p249ebf58ee28b63c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi.

I have 2 tables - one with calls numbers and another with calls codes.
The structure almost like this:
billing=# \d a_voip
Table "public.a_voip"
Column | Type |
Modifiers
--------------------+-----------------------------+-----------------------------------------------------
id | integer | not null default
nextval('a_voip_id_seq'::regclass)
tm | timestamp without time zone | not null
user_name | character varying(50) | not null
...
calling_station_id | character varying(20) | not null
called_station_id | character varying(20) | not null
Indexes:
"a_voip_pkey" PRIMARY KEY, btree (id)
"a_voip_tm" btree (tm)

billing=# \d a_voip_codes
Table "public.a_voip_codes"
Column | Type | Modifiers
--------+-----------------------+-----------
code | integer | not null
region | character varying(77) |
tarif | numeric(13,7) |
Indexes:
"a_voip_codes_pkey" PRIMARY KEY, btree (code)

I need to select longest codes from a_voip_codes which match with the
the called_station_id. Because codes (very rarely) changes I construct
query

SELECT user_name, called_station_id,
(SELECT code FROM a_voip_codes AS c where v.called_station_id like
c.code || '%' order by code desc limit 1) AS code
FROM a_voip AS v WHERE user_name = 'dixi' AND tm between '2006-04-01'
and '2006-05-01' group by user_name, called_station_id;

Analyzed variant
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE
user_name = 'dixi' AND tm between '2006-04-01' and '2006-05-01' group
by user_name, called_station_id;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=11515.93..12106.26 rows=69 width=22) (actual
time=215.719..677.044 rows=130 loops=1)
-> Bitmap Heap Scan on a_voip v (cost=1106.66..11513.16 rows=554
width=22) (actual time=72.336..207.618 rows=848 loops=1)
Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
Filter: ((user_name)::text = 'dixi'::text)
-> Bitmap Index Scan on a_voip_tm (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.441..69.441 rows=93594 loops=1)
Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
SubPlan
-> Limit (cost=0.00..8.55 rows=1 width=4) (actual
time=3.565..3.567 rows=1 loops=130)
-> Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c (cost=0.00..85.45 rows=10 width=4) (actual
time=3.560..3.560 rows=1 loops=130)
Filter: (($0)::text ~~ ((code)::text || '%'::text))
Total runtime: 678.186 ms
(11 rows)

It is ugly, however not so long (but only for 69 rows). If I want to
select for ALL users it goes veeeery long:
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE tm
between '2006-04-01' and '2006-05-01' group by user_name,
called_station_id;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=11740.52..107543.85 rows=11198 width=22) (actual
time=779.488..75637.623 rows=20564 loops=1)
-> Bitmap Heap Scan on a_voip v (cost=1106.66..11285.81
rows=90943 width=22) (actual time=72.539..274.850 rows=90204 loops=1)
Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
-> Bitmap Index Scan on a_voip_tm (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.853..69.853 rows=93594 loops=1)
Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
SubPlan
-> Limit (cost=0.00..8.55 rows=1 width=4) (actual
time=3.631..3.633 rows=1 loops=20564)
-> Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c (cost=0.00..85.45 rows=10 width=4) (actual
time=3.623..3.623 rows=1 loops=20564)
Filter: (($0)::text ~~ ((code)::text || '%'::text))
Total runtime: 75652.199 ms
(10 rows)

So I want to ask, how can I reorganize query/structure for achieve
good performance?

I experiment with additional column (matched_code) for a_voip table
and think about RULE which will update that column "matched_code"
doing the (SELECT code FROM a_voip_codes AS c where
v.called_station_id like c.code || '%' order by code desc limit 1) job
when a_voip_codes updated. Or about TRIGGER. But this may also takes
long time, especially with short "code" numbers (like 1 digit). Look:

billing=# explain analyze UPDATE a_voip SET matched_code = (SELECT
code FROM a_voip_codes AS c WHERE a_voip.called_station_id like c.code
|| '%' order by code desc limit 1) WHERE matched_code LIKE '1%';

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on a_voip (cost=20.34..20467.27 rows=2057
width=168) (actual time=13.407..22201.369 rows=2028 loops=1)
Filter: ((matched_code)::text ~~ '1%'::text)
-> Bitmap Index Scan on a_voip_matched_code (cost=0.00..20.34
rows=2057 width=0) (actual time=2.035..2.035 rows=2028 loops=1)
Index Cond: (((matched_code)::text >= '1'::character varying)
AND ((matched_code)::text < '2'::character varying))
SubPlan
-> Limit (cost=0.00..8.55 rows=1 width=4) (actual
time=10.909..10.911 rows=1 loops=2028)
-> Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c (cost=0.00..85.45 rows=10 width=4) (actual
time=10.923..10.923 rows=1 loops=2028)
Filter: (($0)::text ~~ ((code)::text || '%'::text))
Total runtime: 23216.770 ms
(9 rows)

Is there any other ways to connect longest "code" with "called_station_id"?
--
engineer

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message chris smith 2006-05-29 10:36:43 Re: select with "like" from another table
Previous Message Greg Stark 2006-05-28 23:24:14 Re: App very unresponsive while performing simple update