Infinite CPU loop due to field ::type casting

From: Steven Rosenstein <srosenst(at)us(dot)ibm(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Infinite CPU loop due to field ::type casting
Date: 2004-09-22 20:09:10
Message-ID: OFAA6C93B2.8783E099-ON85256F17.00695E96-85256F17.006EB434@us.ibm.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Gentlefolk,

I'm not sure if this is the proper forum for this question, and it might
have been answered in a previous thread, but I'm new to PostgreSQL and the
research I did in the archives did not turn up anything addressing this
issue. Please direct me to the proper forum is this is not the correct
venue.

Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6
(stock with the RHEL distribution)

The two tables I used in the example are tbl_device and tbl_sad_event:

vsa=# \d vsa.tbl_device;
Table "vsa.tbl_device"
Column | Type |
Modifiers
----------------+--------------------------+---------------------------------------------------------
id | integer | not null default
nextval('vsa.tbl_device_id_seq'::text)
name | character varying(100) | not null
account_id | bigint | not null
vss_site_id | bigint | not null
org_site_id | bigint | not null default 0
device_type_id | integer | not null default 1
os_family_id | integer | not null default 0
status_id | integer | not null default 0
timezone | character varying(80) |
clientkey | character varying(2048) | not null
record_created | timestamp with time zone | default now()
Indexes: pk_tbl_device primary key btree (id),
idx_d_uniq_name_site_account_key unique btree (name, vss_site_id,
account_id, clientkey),
tbl_device_clientkey_key unique btree (clientkey),
idx_d_account_id btree (account_id),
idx_d_account_site_name btree (account_id, vss_site_id, name),
idx_d_device_type_id btree (device_type_id),
idx_d_name btree (name),
idx_d_org_site_id btree (org_site_id),
idx_d_os_family_id btree (os_family_id),
idx_d_status_id btree (status_id),
idx_d_vss_site_id btree (vss_site_id)
Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES
vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES
vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
fk_d_dof FOREIGN KEY (os_family_id) REFERENCES
vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
fk_d_dsc FOREIGN KEY (status_id) REFERENCES
vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
fk_d_dt FOREIGN KEY (device_type_id) REFERENCES
vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION
Triggers: trg_clean_device_name

vsa=# \d vsa.tbl_sad_event
Table "vsa.tbl_sad_event"
Column | Type |
Modifiers
----------------+-----------------------------+------------------------------------------------------------
id | integer | not null default
nextval('vsa.tbl_sad_event_id_seq'::text)
device_id | bigint | not null
log_type | integer |
severity | character varying(20) |
time_logged | timestamp without time zone |
user_name | character varying(50) |
remote_user | character varying(50) |
remote_host | character varying(100) |
source_tag | character varying(30) |
event_code | character varying(50) |
type | character varying(6) |
record_created | timestamp with time zone | default now()
Indexes: pk_tbl_sad_event primary key btree (id),
idx_se_dev_time_type btree (device_id, time_logged, "type"),
idx_se_device_id btree (device_id),
idx_se_time_logged btree (time_logged),
idx_se_type btree ("type"),
sjr_se_id_time_type btree (device_id, time_logged, "type")
Foreign Key constraints: fk_sade_d FOREIGN KEY (device_id) REFERENCES
vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CASCADE

Here is my original query, and the query plan generated by the planner:

vsa=# explain
SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count
FROM vsa.tbl_device AS dev
LEFT OUTER JOIN
(SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*)
FROM vsa.dtbl_logged_event_20040922 AS stbl
WHERE stbl.log_type IN (2, 3, 4, 5)
GROUP BY stbl.device_id, stbl.log_type, stbl.severity) AS tbl
ON (dev.id=tbl.device_id)
ORDER BY dev.name;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=40893.18..40960.93 rows=27100 width=79)
Sort Key: dev.name
-> Merge Join (cost=38417.13..38897.77 rows=27100 width=79)
Merge Cond: ("outer".id = "inner".device_id)
-> Sort (cost=869.52..872.70 rows=1275 width=26)
Sort Key: dev.id
-> Seq Scan on tbl_device dev (cost=0.00..803.75 rows=1275 width=26)
-> Sort (cost=37547.62..37615.37 rows=27100 width=26)
Sort Key: tbl.device_id
-> Subquery Scan tbl (cost=0.00..35552.21 rows=27100 width=26)
-> Aggregate (cost=0.00..35552.21 rows=27100 width=26)
-> Group (cost=0.00..34874.70 rows=271005 width=26)
-> Index Scan using idx_le_id_type_severity_evtcode_20040922 on dtbl_logged_event_20040922 stbl
(cost=0.00..32842.16 rows=271005 width=26)
Filter: ((log_type = 2) OR (log_type = 3) OR (log_type = 4) OR (log_type = 5))
(14 rows)

Time: 1.43 ms

Late in the development I realized that we had created an inconsistency in
our design by having vsa.tbl_device.id defined as "int", and
vsa.tbl_sad_event.device_id defined as "bigint". These two fields are used
in the ON clause (ON (dev.id=tbl.device_id)), and my understanding is that
they should be of the same type cast. Trying to remedy this situation, I
explicitly tried casting vsa.tbl_sad_event.device_id as "int" (::int):

vsa=# explain
SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count
FROM vsa.tbl_device AS dev
LEFT OUTER JOIN
(SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*)
FROM vsa.dtbl_logged_event_20040922 AS stbl
WHERE stbl.log_type IN (2, 3, 4, 5) GROUP BY stbl.device_id,
stbl.log_type, stbl.severity) AS tbl
ON (dev.id=tbl.device_id::int)
ORDER BY dev.name;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..45848850.65 rows=27100 width=79)
Join Filter: ("outer".id = ("inner".device_id)::integer)
-> Index Scan using idx_d_name on tbl_device dev (cost=0.00..1490.19 rows=1275 width=26)
-> Subquery Scan tbl (cost=0.00..35552.21 rows=27100 width=26)
-> Aggregate (cost=0.00..35552.21 rows=27100 width=26)
-> Group (cost=0.00..34874.70 rows=271005 width=26)
-> Index Scan using idx_le_id_type_severity_evtcode_20040922 on dtbl_logged_event_20040922 stbl (cost=0.00..32842.16
rows=271005 width=26)
Filter: ((log_type = 2) OR (log_type = 3) OR (log_type = 4) OR (log_type = 5))
(8 rows)

Time: 1.62 ms

Notice that the query plan changes completely when I cast device_id as int.
What is worse (and why I'm writing) is that when I run the second query, it
goes into an infinite CPU loop. The original query completed in under 4
seconds. I've left the second query running for 30 minutes or more, and
TOP show 100% CPU utilization and 0% disk I/O (0% iowait).

We are starting to see this phenomenon in other queries which do *not* have
any explicit type casting, but in which something like
"cast(vsa.tbl_sad_event.time_logged AS date)" is used in a WHERE clause.
It's becoming a show-stopper until we understand what is happening.

Any information or suggestions about this problem or making the query more
efficient will be greatly appreciated.

Thanks!
--- Steve

Browse pgsql-performance by date

  From Date Subject
Next Message Steven Rosenstein 2004-09-22 20:33:16 Fw: Infinite CPU loop due to field ::type casting, Take II :-)
Previous Message Tom Lane 2004-09-22 19:59:07 Re: Caching of Queries