Server crash on cancelling expensive query.

From: "Larry Wissink" <lwissink(at)ebates(dot)com>
To: <sfpug(at)postgresql(dot)org>
Subject: Server crash on cancelling expensive query.
Date: 2006-06-14 22:12:33
Message-ID: F93D08A809C46B4A966D3DCF662B9B68038B381B@exchange2.corp.ebates.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

Hi all,

I'm not sure what the proper protocol for this question is, but here
goes:

A developer at my company was running some queries that shouldn't have
been run because they selected from a very large table and ordered by an
unindexed field. The odd thing is that when the developer cancelled the
query the whole server rebooted. (Not just the postgres process, the
whole server!) It seems that the cancellation rather than the query
itself caused the reboot. It seems like running a big query with a
where clause or order by on fields that are not indexed and

then cancelling the query after say minutes of running causes the server
to crash and reboot. It happened several times this week already on
several occasions. This seems not quite right.

Of course my answer to the problem is not to run the wrong query, or not
to run any query without running an explain plan on it first. And I'm
sure everyone will agree to that.

That said, is there anyone who has an idea why the server would crash.
Why would cancelling a query do anything negative? Is it something that
could be avoided with a config change of some kind? (Or is it a kind of
rough justice for executing queries that should not be executed.)
Perhaps is there a way of preventing a query over a certain cost from
being executed at all.

Details on crash:

Postgres Version 7.4.7 (Yes I know that's very old. I'm sure an upgrade
will fix the problem.)

Table is very large (150 million rows).

One query that causes the server to reboot:

select * from merchant_sessions order by ms_end_date desc

limit 50;

EBQA2=# \d merchant_sessions

Table "public.merchant_sessions"

Column | Type | Modifiers

-----------------------+--------------------------+-----------

ms_id | numeric | not null

ms_frn_user_id | numeric | not null

ms_frn_merchant_id | numeric | not null

ms_frn_order_id | numeric |

ms_click_date | timestamp with time zone | not null

ms_end_date | timestamp with time zone | not null

ms_source_code | character(3) |

ms_frn_navigation_id | numeric(38,0) |

ms_url | character varying(2000) |

ms_frn_user_entry_id | numeric(38,0) |

ms_frn_site_id | numeric(38,0) |

ms_frn_special_day_id | numeric(38,0) |

ms_frn_banner_id | numeric(38,0) |

ms_partner_txn_id | character varying(256) |

Indexes:

"con_ms_id_pri" primary key, btree (ms_id)

"ms_click_date_index" btree (ms_click_date)

"ms_click_date_indx" btree (ms_frn_user_id, ms_click_date)

"ms_partner_txn_id_ix" btree (ms_partner_txn_id)

Foreign-key constraints:

"con_ms_frn_merchant_id_2" FOREIGN KEY (ms_frn_merchant_id)
REFERENCES merchants(merchant_id) ON DELETE CASCADE

Note that there is no index on the field the user was trying to order
by.

I'm embarassed to have to display the following explain plan:

EBQA2=# explain select * from merchant_sessions order by ms_end_date
desc

limit 50;

EBQA2-# QUERY PLAN

------------------------------------------------------------------------
--------------------

Limit (cost=337011335.42..337011335.54 rows=50 width=948)

-> Sort (cost=337011335.42..337237084.76 rows=90299736 width=948)

Sort Key: ms_end_date

-> Seq Scan on merchant_sessions (cost=0.00..3282499.68
rows=90299736 width=948)

(4 rows)

The developer managed to find another query to cause the same problem.
This query involves several tables joined together. I won't bother with
the structure of each one, but just provide the query and explain plan.

select count(*) from event_entity ee join source src on
ee.even_instance_id = src.source_id

join event evt on ee.even_event_id = evt.event_id where evt.event_type =
4;

EBQA2=# explain select count(*) from event_entity ee join source src on
ee.even_instance_id = src.source_id

join event evt on ee.even_event_id = evt.event_id where evt.event_type =
4

EBQA2-# EBQA2-# ;

QUERY PLAN

------------------------------------------------------------------------
----------------------------------------------------

Aggregate (cost=384149.36..384149.36 rows=1 width=0)

-> Merge Join (cost=380670.65..384105.00 rows=17744 width=0)

Merge Cond: ("outer".source_id = "inner".even_instance_id)

-> Index Scan using source_pk on source src
(cost=0.00..3026.54 rows=95556 width=4)

-> Sort (cost=380670.65..380715.01 rows=17744 width=4)

Sort Key: ee.even_instance_id

-> Nested Loop (cost=0.00..379418.37 rows=17744
width=4)

-> Seq Scan on event evt (cost=0.00..313598.96
rows=21745 width=8)

Filter: (event_type = 4)

-> Index Scan using event_entity_event_id_index on
event_entity ee (cost=0.00..3.02 rows=1 width=12)

Index Cond: (ee.even_event_id =
"outer".event_id)

(11 rows)

Any insight will be appreciated.

Larry.

lwissink(at)ebates(dot)com

Responses

Browse sfpug by date

  From Date Subject
Next Message Josh Berkus 2006-06-14 22:29:02 Re: Server crash on cancelling expensive query.
Previous Message Githogori Nyangara-Murage 2006-06-13 06:36:02 Re: LWW Planning