Problems with query in highly concurrent environment

From: Aleksei Valikov <valikov(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Problems with query in highly concurrent environment
Date: 2005-02-24 22:38:09
Message-ID: 421E5751.5070804@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi.

We have a very strange problem and no idea how to solve it.

We have a PROGRESSION table that stores information about various
telephony events. Rows are never updated, only inserted, about 20-30
thousand rows a day.

Based on this table, we need to produce certain statistics, which are
created by a complex SQL query. This query runs quite fast on our test
systems, but is EXTREMELY slow on the production system. The slow down
factor is 100-500 times. A query that runs 2 seconds on the test system
takes sometimes up to 12 minutes in the production system.

Test systems run on the dumps of the production system, so the
difference between databases is not big. The only difference is that
production system is concurrently accessed by a number of users (and
PROGRESSION table gets plenty of inserts).

During the off-peak times (at night, for instance), statistics on the
production system are as fast as on the test systems (even faster
because of the server hardware).

Moreover, what I find very strange is that when I log in with psql (or
any other SQL client) onto the production database and execute the same
query, it runs fast - even in peak hours!

I would be extremely grateful for any hints or ideas on how could we
resolve this.

My personal theory is that concurrent reads/writes create some kind of
blocks (for instance, on indexes). Simply "heavy load" of the production
system would not explain the slowdown factor of 400. Under which
conditions could writes block reading query? What could we do to
diagnose those blocks?

I'm afraid I can't drop indexes on the production system.

Becoming a bit desperate...

Bye.
/lexi

ps.

The table description:

Table "public.progression"
Column | Type | Modifiers
---------------+------------------------+------------------------------------------------------
id | bigint | not null default
nextval('progression_id_seq'::text)
timestamp | bigint | not null
entry_type | integer | not null
call_id | bigint | not null
hotline_id | bigint | not null
hotline_name | character varying(64) | not null
team_id | bigint | not null
team_name | character varying(64) | not null
expert_id | bigint | not null
expert_name | character varying(64) | not null
expert_role | bigint | not null
remote_number | character varying(32) |
comment | character varying(255) | not null default
''::character varying

Indexes:
"progression_call_id_idx" btree (call_id)
"progression_entry_type_timestamp_idx" btree (entry_type, "timestamp")
"progression_expert_id" btree (expert_id)

select count(*) from progression returns 1870961

The statistics query (please be gentle):

SELECT CALL.HOTLINE_ID || '.' || CALL.PERIOD_NUMBER as {record.id},
CALL.PERIOD_NUMBER as {record.periodNumber}, (:startTimestamp +
CALL.PERIOD_NUMBER * :intervalSize) AS {record.timestamp},
CALL.HOTLINE_ID AS {record.hotlineId}, CALL.HOTLINE_NAME AS
{record.hotlineName}, count(*) AS {record.totalCallCount},
sum(CALL.IS_PROCESSED) as {record.processedCallCount},
sum(CALL.TOTAL_TIME) AS {record.totalTime}, sum(CALL.SPOKEN_TIME) as
{record.spokenTime}
FROM
(SELECT
draft._CALL_ID as CALL_ID,
draft._TIMESTAMP as TIMESTAMP,
draft._HOTLINE_ID as HOTLINE_ID,
draft._HOTLINE_NAME as HOTLINE_NAME,
case when draft._CUSTOMER_CALL_GETS_CONNECTED > 0 then 1 else 0
end as IS_CONNECTED,
case when draft._CUSTOMER_CALL_GETS_CONNECTED > 0 and
(draft._CUSTOMER_CALL_GETS_DISCONNECTED > 0 OR
draft._CUSTOMER_CALL_DROPPED > 0) then 1 else 0 end as IS_PROCESSED,
case when draft._AGENT_CALL_GETS_CONNECTED > 0 and
draft._AGENT_CALL_GETS_DISCONNECTED > 0 THEN
draft._AGENT_CALL_GETS_DISCONNECTED - draft._AGENT_CALL_GETS_CONNECTED
when draft._AGENT_CALL_GETS_CONNECTED > 0 and
draft._AGENT_CALL_DROPPED > 0 THEN draft._AGENT_CALL_DROPPED -
draft._AGENT_CALL_GETS_CONNECTED else 0 END as TOTAL_TIME,
case when draft._AGENT_CALL_STARTS_TALKING > 0 and
draft._AGENT_CALL_GETS_DISCONNECTED > 0 THEN
draft._AGENT_CALL_GETS_DISCONNECTED - draft._AGENT_CALL_STARTS_TALKING
when draft._AGENT_CALL_STARTS_TALKING > 0 and
draft._AGENT_CALL_DROPPED > 0 THEN draft._AGENT_CALL_DROPPED -
draft._AGENT_CALL_STARTS_TALKING else 0 END as SPOKEN_TIME,
floor((draft._TIMESTAMP - :startTimestamp) / :intervalSize) as
PERIOD_NUMBER
FROM
(SELECT
p.call_id
as _CALL_ID,
sum(case when p.entry_type = 1 then p.timestamp else 0 end)
as _TIMESTAMP ,
p.hotline_id
as _HOTLINE_ID,
p.hotline_name
as _HOTLINE_NAME,
sum(case when p.entry_type = 2 then p.timestamp else 0 end)
as _CUSTOMER_CALL_GETS_CONNECTED ,
sum(case when p.entry_type = 3 then p.timestamp else 0 end)
as _CUSTOMER_CALL_GETS_DISCONNECTED ,
sum(case when p.entry_type = 4 then p.timestamp else 0 end)
as _CUSTOMER_CALL_DROPPED ,
sum(case when p.entry_type = 8 then p.timestamp else 0 end)
as _AGENT_CALL_GETS_CONNECTED ,
sum(case when p.entry_type = 9 then p.timestamp else 0 end)
as _AGENT_CALL_GETS_DISCONNECTED ,
sum(case when p.entry_type = 10 then p.timestamp else 0 end)
as _AGENT_CALL_DROPPED ,
sum(case when p.entry_type = 16 then p.timestamp else 0 end)
as _AGENT_CALL_STARTS_TALKING
FROM
progression p
WHERE
p.hotline_id in (:hotlineIds) and
p.entry_type in (1,2,3,4,8,9,10,16) and
p.timestamp > :startTimestamp and
p.timestamp < (:endTimestamp + :maximalDuration) and
(p.entry_type != 1 or p.timestamp < :endTimestamp)
GROUP BY p.call_id, p.hotline_id, p.hotline_name) as draft) AS CALL
WHERE CALL.TIMESTAMP > 0
GROUP BY CALL.HOTLINE_ID, CALL.HOTLINE_NAME, CALL.PERIOD_NUMBER

Query execution plan for this query (for the time period of January this
year, all hotlines, :maximalDuration = 36000000, :intervalSize =
3600000) is as follows:

HashAggregate
(cost=143395.98..159630.37 rows=209476 width=354)
-> Subquery Scan draft
(cost=125590.52..139730.15 rows=209476 width=354)
-> HashAggregate
(cost=125590.52..136588.01 rows=209476 width=55)
Filter: (((min(CASE WHEN (entry_type = 1) THEN "timestamp" ELSE
0::bigint END) - 1104534000000::bigint) / 3600000) > 0)
-> Seq Scan on progression p
(cost=0.00..119306.24 rows=209476 width=55)
Filter: (((entry_type = 1) OR (entry_type = 2) OR (entry_type = 3)
OR (entry_type = 4) OR (entry_type = 8) OR (entry_type = 9) OR
(entry_type = 10) OR (entry_type = 16)) AND ("timestamp" >
1104534000000::bigint) AND ("timestamp" < 1107248400000::bigint) AND
((entry_type <> 1) OR ("timestamp" < 1107212400000::bigint)))

Browse pgsql-general by date

  From Date Subject
Next Message Ron Mayer 2005-02-24 22:39:22 Re: [PATCHES] A way to let Vacuum warn if FSM settings are low.
Previous Message Tom Lane 2005-02-24 22:34:57 Re: [PATCHES] A way to let Vacuum warn if FSM settings are low.