Oracle v. Postgres 9.0 query performance

From: Tony Capobianco <tcapobianco(at)prospectiv(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Oracle v. Postgres 9.0 query performance
Date: 2011-06-08 15:11:32
Message-ID: 1307545892.1990.23.camel@tony1.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We are thiiiiiis close to moving our datawarehouse from Oracle to
Postgres. This query is identical on both systems, but runs much, much
faster on Oracle. Our Postgres host has far superior hardware and
tuning parameters have been set via pgtune. Most everything else runs
faster in Postgres, except for this query. In Oracle, we get a hash
join that takes about 2 minutes:

SQL> set line 200
delete from plan_table;
explain plan for
CREATE TABLE ecr_opens
as
select o.emailcampaignid, count(memberid) opencnt
from openactivity o,ecr_sents s
where s.emailcampaignid = o.emailcampaignid
group by o.emailcampaignid;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL>
13 rows deleted.

SQL> 2 3 4 5 6 7
Explained.

SQL> SQL>
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4034426201

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows |
Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 5094 |
91692 | 9651 (24)| 00:02:16 | | | |
| 1 | LOAD AS SELECT | ECR_OPENS | |
| | | | | |
| 2 | PX COORDINATOR | | |
| | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 5094 |
91692 | 2263 (100)| 00:00:32 | Q1,02 | P->S | QC (RAND) |
| 4 | HASH GROUP BY | | 5094 |
91692 | 2263 (100)| 00:00:32 | Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 5094 |
91692 | 2263 (100)| 00:00:32 | Q1,02 | PCWP | |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 6 | PX SEND HASH | :TQ10001 | 5094 |
91692 | 2263 (100)| 00:00:32 | Q1,01 | P->P | HASH |
| 7 | HASH GROUP BY | | 5094 |
91692 | 2263 (100)| 00:00:32 | Q1,01 | PCWP | |
| 8 | NESTED LOOPS | | 17M|
297M| 200 (98)| 00:00:03 | Q1,01 | PCWP | |
| 9 | BUFFER SORT | | |
| | | Q1,01 | PCWC | |
| 10 | PX RECEIVE | | |
| | | Q1,01 | PCWP | |
| 11 | PX SEND ROUND-ROBIN| :TQ10000 | |
| | | | S->P | RND-ROBIN |
| 12 | TABLE ACCESS FULL | ECR_SENTS | 476 |
6188 | 3 (0)| 00:00:01 | | | |
|* 13 | INDEX RANGE SCAN | OPENACT_EMCAMP_IDX | 36355 |
177K| 1 (0)| 00:00:01 | Q1,01 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

13 - access("S"."EMAILCAMPAIGNID"="O"."EMAILCAMPAIGNID")

Note
-----
- dynamic sampling used for this statement

29 rows selected.

SQL> desc openactivity
Name Null? Type
----------------------------------------- --------
----------------------------
EMAILCAMPAIGNID NOT NULL NUMBER
MEMBERID NOT NULL NUMBER
OPENDATE DATE
IPADDRESS VARCHAR2(25)
DATE_ID NUMBER

SQL> select count(*) from openactivity;

COUNT(*)
----------
192542480

SQL> desc ecr_sents
Name Null? Type
----------------------------------------- --------
----------------------------
EMAILCAMPAIGNID NUMBER
MEMCNT NUMBER
DATE_ID NUMBER
SENTDATE DATE

SQL> select count(*) from ecr_sents;

COUNT(*)
----------
476

Our final result is the ecr_opens table which is 476 rows.

On Postgres, this same query takes about 58 minutes (could not run
explain analyze because it is in progress):

pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-# from openactivity o,ecr_sents s
pg_dw-# where s.emailcampaignid = o.emailcampaignid
pg_dw-# group by o.emailcampaignid;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..1788988.05 rows=9939 width=12)
-> Nested Loop (cost=0.00..1742467.24 rows=9279316 width=12)
-> Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
-> Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)

pg_dw=# \d openactivity
Table "openactivity"
Column | Type | Modifiers
-----------------+-----------------------+-----------
emailcampaignid | integer | not null
memberid | bigint | not null
opendate | date |
ipaddress | character varying(25) |
date_id | integer |
Indexes:
"openact_dateid_idx" btree (date_id), tablespace "pg_idx"
"openact_emcamp_idx" btree (emailcampaignid), tablespace "pg_idx"

pg_dw=# select count(*) from openactivity;
count
-----------
192542480

pg_dw=# \d ecr_sents
Table "staging.ecr_sents"
Column | Type | Modifiers
-----------------+---------+-----------
emailcampaignid | integer |
memcnt | numeric |
date_id | integer |
sentdate | date |
Indexes:
"ecr_sents_ecid_idx" btree (emailcampaignid), tablespace
"staging_idx"

pg_dw=# select count(*) from ecr_sents;
count
-------
479

We added an index on ecr_sents to see if that improved performance, but
did not work. Both tables have updated stats:

pg_dw=# select relname, last_vacuum, last_autovacuum, last_analyze,
last_autoanalyze from pg_stat_all_tables where relname in
('openactivity','ecr_sents');
relname | last_vacuum | last_autovacuum |
last_analyze | last_autoanalyze
--------------+-------------------------------+-----------------+-------------------------------+-------------------------------
ecr_sents | | |
2011-06-08 10:31:20.677172-04 | 2011-06-08 10:31:34.545504-04
openactivity | 2011-06-02 16:34:47.129695-04 | |
2011-06-07 13:48:21.909546-04 | 2011-04-27 17:49:15.004551-04

Relevant info:
pg_dw=# SELECT
pg_dw-# 'version'::text AS "name",
pg_dw-# version() AS "current_setting"
pg_dw-# UNION ALL
pg_dw-# SELECT
pg_dw-# name,current_setting(name)
pg_dw-# FROM pg_settings
pg_dw-# WHERE NOT source='default' AND NOT name IN
pg_dw-# ('config_file','data_directory','hba_file','ident_file',
pg_dw(# 'log_timezone','DateStyle','lc_messages','lc_monetary',
pg_dw(# 'lc_numeric','lc_time','timezone_abbreviations',
pg_dw(# 'default_text_search_config','application_name',
pg_dw(# 'transaction_deferrable','transaction_isolation',
pg_dw(# 'transaction_read_only');
name |
current_setting
------------------------------+-------------------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.0.3 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-48), 64-bit
archive_command | (disabled)
archive_timeout | 1h
autovacuum_max_workers | 10
checkpoint_completion_target | 0.9
checkpoint_segments | 64
checkpoint_timeout | 1h
constraint_exclusion | on
default_statistics_target | 100
effective_cache_size | 22GB
effective_io_concurrency | 5
fsync | on
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_checkpoints | on
log_destination | stderr
log_directory | pg_log
log_error_verbosity | default
log_filename | pg_dw.log
log_line_prefix | %m-%u-%p
log_lock_waits | on
log_min_error_statement | panic
log_min_messages | notice
log_rotation_age | 0
log_rotation_size | 0
log_truncate_on_rotation | off
logging_collector | on
maintenance_work_mem | 1GB
max_connections | 400
max_stack_depth | 2MB
search_path | xxxxx
server_encoding | UTF8
shared_buffers | 7680MB
TimeZone | US/Eastern
wal_buffers | 32MB
wal_level | archive
work_mem | 768MB

Should this query be hashing the smaller table on Postgres rather than
using nested loops?

Thanks.
Tony

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message tv 2011-06-08 15:31:26 Re: Oracle v. Postgres 9.0 query performance
Previous Message Shaun Thomas 2011-06-08 13:36:22 Re: Set of related slow queries