[noob] How to optimize this double pivot query?

From: Robert Buck <buck(dot)robert(dot)j(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: [noob] How to optimize this double pivot query?
Date: 2012-10-02 00:47:02
Message-ID: CADf7wwVizZbiQbOp+51HKoDfzYtZRf4trCwJxNniCgqiJDAZ0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have two tables that contain key-value data that I want to combine in
pivoted form into a single result set. They are related to two separate
tables.

The tables are: test_results, test_variables, metric_def, metadata_key. The
latter two tables are enum-like tables, basic descriptors of data stored in
other tables. The former two tables are basically key-value tables (with
ids as well); these k-v tables are related to the latter two tables via
foreign keys.

The following SQL takes about 11 seconds to run on a high-end laptop. The
largest table is about 54k records, pretty puny.

Can someone provide a hint as to why this is so slow? Again, I am a noob to
SQL, so the SQL is probably poorly written.

Thanks in advance,

Bob

select

t.id_name,
max(t.begin_time) as begin_time,
max(t.end_time) as end_time,

max(case when (m.id_name = 'package-version') then v.value end) as
package_version,
max(case when (m.id_name = 'database-vendor') then v.value end) as
database_vendor,
max(case when (m.id_name = 'bean-name') then v.value end) as bean_name,
max(case when (m.id_name = 'request-distribution') then v.value end) as
request_distribution,
max(case when (m.id_name = 'ycsb-workload') then v.value end) as
ycsb_workload,
max(case when (m.id_name = 'record-count') then v.value end) as
record_count,
max(case when (m.id_name = 'transaction-engine-count') then v.value
end) as transaction_engine_count,
max(case when (m.id_name = 'transaction-engine-maxmem') then v.value
end) as transaction_engine_maxmem,
max(case when (m.id_name = 'storage-manager-count') then v.value end)
as storage_manager_count,
max(case when (m.id_name = 'test-instance-count') then v.value end) as
test_instance_count,
max(case when (m.id_name = 'operation-count') then v.value end) as
operation_count,
max(case when (m.id_name = 'update-percent') then v.value end) as
update_percent,
max(case when (m.id_name = 'thread-count') then v.value end) as
thread_count,

max(case when (d.id_name = 'tps') then r.value end) as tps,
max(case when (d.id_name = 'Memory') then r.value end) as memory,
max(case when (d.id_name = 'DiskWritten') then r.value end) as
disk_written,
max(case when (d.id_name = 'PercentUserTime') then r.value end) as
percent_user,
max(case when (d.id_name = 'PercentCpuTime') then r.value end) as
percent_cpu,
max(case when (d.id_name = 'UserMilliseconds') then r.value end) as
user_milliseconds,
max(case when (d.id_name = 'YcsbUpdateLatencyMicrosecs') then r.value
end) as update_latency,
max(case when (d.id_name = 'YcsbReadLatencyMicrosecs') then r.value
end) as read_latency,
max(case when (d.id_name = 'Updates') then r.value end) as updates,
max(case when (d.id_name = 'Deletes') then r.value end) as deletes,
max(case when (d.id_name = 'Inserts') then r.value end) as inserts,
max(case when (d.id_name = 'Commits') then r.value end) as commits,
max(case when (d.id_name = 'Rollbacks') then r.value end) as rollbacks,
max(case when (d.id_name = 'Objects') then r.value end) as objects,
max(case when (d.id_name = 'ObjectsCreated') then r.value end) as
objects_created,
max(case when (d.id_name = 'FlowStalls') then r.value end) as
flow_stalls,
max(case when (d.id_name = 'NodeApplyPingTime') then r.value end) as
node_apply_ping_time,
max(case when (d.id_name = 'NodePingTime') then r.value end) as
node_ping_time,
max(case when (d.id_name = 'ClientCncts') then r.value end) as
client_connections,
max(case when (d.id_name = 'YcsbSuccessCount') then r.value end) as
success_count,
max(case when (d.id_name = 'YcsbWarnCount') then r.value end) as
warn_count,
max(case when (d.id_name = 'YcsbFailCount') then r.value end) as
fail_count

from test as t

left join test_results as r on r.test_id = t.id
left join test_variables as v on v.test_id = t.id
left join metric_def as d on d.id = r.metric_def_id
left join metadata_key as m on m.id = v.metadata_key_id

group by t.id_name

;

"GroupAggregate (cost=5.87..225516.43 rows=926 width=81)"
" -> Nested Loop Left Join (cost=5.87..53781.24 rows=940964 width=81)"
" -> Nested Loop Left Join (cost=1.65..1619.61 rows=17235
width=61)"
" -> Index Scan using test_uc on test t (cost=0.00..90.06
rows=926 width=36)"
" -> Hash Right Join (cost=1.65..3.11 rows=19 width=29)"
" Hash Cond: (m.id = v.metadata_key_id)"
" -> Seq Scan on metadata_key m (cost=0.00..1.24
rows=24 width=21)"
" -> Hash (cost=1.41..1.41 rows=19 width=16)"
" -> Index Scan using test_variables_test_id_idx
on test_variables v (cost=0.00..1.41 rows=19 width=16)"
" Index Cond: (test_id = t.id)"
" -> Hash Right Join (cost=4.22..6.69 rows=55 width=28)"
" Hash Cond: (d.id = r.metric_def_id)"
" -> Seq Scan on metric_def d (cost=0.00..1.71 rows=71
width=20)"
" -> Hash (cost=3.53..3.53 rows=55 width=16)"
" -> Index Scan using test_results_test_id_idx on
test_results r (cost=0.00..3.53 rows=55 width=16)"
" Index Cond: (test_id = t.id)"

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David Johnston 2012-10-02 01:07:26 Re: [noob] How to optimize this double pivot query?
Previous Message David Johnston 2012-09-30 16:26:00 Re: Reuse temporary calculation results in an SQL update query [SOLVDED]