Re: oracle to psql migration - slow query in postgres

From: Tony Capobianco <tcapobianco(at)prospectiv(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: oracle to psql migration - slow query in postgres
Date: 2010-10-15 18:13:49
Message-ID: 1287166429.1730.170.camel@tony1.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Thanks for all your responses. What's interesting is that an index is
used when this query is executed in Oracle. It appears to do some
parallel processing:

SQL> set line 200
delete from plan_table;
explain plan for
select websiteid, emailaddress
from members
where emailok = 1
and emailbounced = 0;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL>
3 rows deleted.

SQL> 2 3 4 5
Explained.

SQL> SQL>
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4247959398

-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes
| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 237M|
7248M| 469K (2)| 01:49:33 | | | |
| 1 | PX COORDINATOR | | |
| | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 237M|
7248M| 469K (2)| 01:49:33 | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 237M|
7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC | |
|* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M|
7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP | |
-------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("EMAILBOUNCED"=0 AND "EMAILOK"=1)

16 rows selected.

On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote:
>
> > -----Original Message-----
> > From: Tony Capobianco [mailto:tcapobianco(at)prospectiv(dot)com]
> > Sent: Thursday, October 14, 2010 3:43 PM
> > To: pgsql-performance(at)postgresql(dot)org
> > Subject: oracle to psql migration - slow query in postgres
> >
> > We are in the process of testing migration of our oracle data
> > warehouse over to postgres. A potential showstopper are full
> > table scans on our members table. We can't function on
> > postgres effectively unless index scans are employed. I'm
> > thinking I don't have something set correctly in my
> > postgresql.conf file, but I'm not sure what.
> >
> > This table has approximately 300million rows.
> >
> > Version:
> > SELECT version();
> >
> > version
> > --------------------------------------------------------------
> > ----------------------------------------------------
> > PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC
> > gcc (GCC)
> > 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
> >
> > We have 4 quad-core processors and 32GB of RAM. The below
> > query uses the members_sorted_idx_001 index in oracle, but in
> > postgres, the optimizer chooses a sequential scan.
> >
> > explain analyze create table tmp_srcmem_emws1 as select
> > emailaddress, websiteid
> > from members
> > where emailok = 1
> > and emailbounced = 0;
> > QUERY
> > PLAN
> > --------------------------------------------------------------
> > ----------------------------------------------------------------
> > Seq Scan on members (cost=0.00..14137154.64 rows=238177981
> > width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
> > Filter: ((emailok = 1::numeric) AND (emailbounced =
> > 0::numeric)) Total runtime: 850306.220 ms
> > (3 rows)
> >
> > show shared_buffers ;
> > shared_buffers
> > ----------------
> > 7680MB
> > (1 row)
> >
> > show effective_cache_size ;
> > effective_cache_size
> > ----------------------
> > 22GB
> > (1 row)
> >
> > show work_mem ;
> > work_mem
> > ----------
> > 768MB
> > (1 row)
> >
> > show enable_seqscan ;
> > enable_seqscan
> > ----------------
> > on
> > (1 row)
> >
> > Below are the data definitions for the table/indexes in question:
> >
> > \d members
> > Table "members"
> > Column | Type | Modifiers
> > ---------------------+-----------------------------+-----------
> > memberid | numeric | not null
> > firstname | character varying(50) |
> > lastname | character varying(50) |
> > emailaddress | character varying(50) |
> > password | character varying(50) |
> > address1 | character varying(50) |
> > address2 | character varying(50) |
> > city | character varying(50) |
> > statecode | character varying(50) |
> > zipcode | character varying(50) |
> > birthdate | date |
> > emailok | numeric(2,0) |
> > gender | character varying(1) |
> > addeddate | timestamp without time zone |
> > emailbounced | numeric(2,0) |
> > changedate | timestamp without time zone |
> > optoutsource | character varying(100) |
> > websiteid | numeric |
> > promotionid | numeric |
> > sourceid | numeric |
> > siteid | character varying(64) |
> > srcwebsiteid | numeric |
> > homephone | character varying(20) |
> > homeareacode | character varying(10) |
> > campaignid | numeric |
> > srcmemberid | numeric |
> > optoutdate | date |
> > regcomplete | numeric(1,0) |
> > regcompletesourceid | numeric |
> > ipaddress | character varying(25) |
> > pageid | numeric |
> > streetaddressstatus | numeric(1,0) |
> > middlename | character varying(50) |
> > optinprechecked | numeric(1,0) |
> > optinposition | numeric |
> > homephonestatus | numeric |
> > addeddate_id | numeric |
> > changedate_id | numeric |
> > rpmindex | numeric |
> > optmode | numeric(1,0) |
> > countryid | numeric |
> > confirmoptin | numeric(2,0) |
> > bouncedate | date |
> > memberageid | numeric |
> > sourceid2 | numeric |
> > remoteuserid | character varying(50) |
> > goal | numeric(1,0) |
> > flowdepth | numeric |
> > pagetype | numeric |
> > savepassword | character varying(50) |
> > customerprofileid | numeric |
> > Indexes:
> > "email_website_unq" UNIQUE, btree (emailaddress,
> > websiteid), tablespace "members_idx"
> > "member_addeddateid_idx" btree (addeddate_id), tablespace
> > "members_idx"
> > "member_changedateid_idx" btree (changedate_id),
> > tablespace "members_idx"
> > "members_fdate_idx" btree
> > (to_char_year_month(addeddate)), tablespace "esave_idx"
> > "members_memberid_idx" btree (memberid), tablespace "members_idx"
> > "members_mid_emailok_idx" btree (memberid, emailaddress,
> > zipcode, firstname, emailok), tablespace "members_idx"
> > "members_sorted_idx_001" btree (websiteid, emailok,
> > emailbounced, addeddate, memberid, zipcode, statecode,
> > emailaddress), tablespace "members_idx"
> > "members_src_idx" btree (websiteid, emailbounced,
> > sourceid), tablespace "members_idx"
> > "members_wid_idx" btree (websiteid), tablespace "members_idx"
> >
> > select tablename, indexname, tablespace, indexdef from
> > pg_indexes where tablename = 'members'; -[ RECORD
> > 1
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | members_fdate_idx
> > tablespace | esave_idx
> > indexdef | CREATE INDEX members_fdate_idx ON members USING btree
> > (to_char_year_month(addeddate))
> > -[ RECORD
> > 2
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | member_changedateid_idx
> > tablespace | members_idx
> > indexdef | CREATE INDEX member_changedateid_idx ON members
> > USING btree
> > (changedate_id)
> > -[ RECORD
> > 3
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | member_addeddateid_idx
> > tablespace | members_idx
> > indexdef | CREATE INDEX member_addeddateid_idx ON members
> > USING btree
> > (addeddate_id)
> > -[ RECORD
> > 4
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | members_wid_idx
> > tablespace | members_idx
> > indexdef | CREATE INDEX members_wid_idx ON members USING btree
> > (websiteid)
> > -[ RECORD
> > 5
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | members_src_idx
> > tablespace | members_idx
> > indexdef | CREATE INDEX members_src_idx ON members USING btree
> > (websiteid, emailbounced, sourceid)
> > -[ RECORD
> > 6
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | members_sorted_idx_001
> > tablespace | members_idx
> > indexdef | CREATE INDEX members_sorted_idx_001 ON members
> > USING btree
> > (websiteid, emailok, emailbounced, addeddate, memberid,
> > zipcode, statecode, emailaddress) -[ RECORD
> > 7
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | members_mid_emailok_idx
> > tablespace | members_idx
> > indexdef | CREATE INDEX members_mid_emailok_idx ON members
> > USING btree
> > (memberid, emailaddress, zipcode, firstname, emailok) -[ RECORD
> > 8
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | members_memberid_idx
> > tablespace | members_idx
> > indexdef | CREATE INDEX members_memberid_idx ON members USING btree
> > (memberid)
> > -[ RECORD
> > 9
> > ]-------------------------------------------------------------
> > --------------------------------------------------------------
> > -------------------------
> > tablename | members
> > indexname | email_website_unq
> > tablespace | members_idx
> > indexdef | CREATE UNIQUE INDEX email_website_unq ON members USING
> > btree (emailaddress, websiteid)
> >
> >
> > This table has also been vacuumed analyzed as well:
> >
> > select * from pg_stat_all_tables where relname = 'members';
> > -[ RECORD 1 ]----+------------------------------
> > relid | 3112786
> > schemaname | xxxxx
> > relname | members
> > seq_scan | 298
> > seq_tup_read | 42791828896
> > idx_scan | 31396925
> > idx_tup_fetch | 1083796963
> > n_tup_ins | 291308316
> > n_tup_upd | 0
> > n_tup_del | 4188020
> > n_tup_hot_upd | 0
> > n_live_tup | 285364632
> > n_dead_tup | 109658
> > last_vacuum | 2010-10-12 20:26:01.227393-04
> > last_autovacuum |
> > last_analyze | 2010-10-12 20:28:01.105656-04
> > last_autoanalyze | 2010-09-16 20:50:00.712418-04
> >
> >
>
>
> Tony,
> For your query:
>
> > select
> > emailaddress, websiteid
> > from members
> > where emailok = 1
> > and emailbounced = 0;
>
> your table doesn't have any indexes where "emailok" or "emailbounced"
> are leading columns.
> That's why existing indexes can not be used.
>
> If you specified "websiteid" in the "where" clause then (most probably)
> the index members_sorted_idx_001 will be used (based on selectivity and
> statistics known to optimizer).
>
> If this query (as is - without "websiteid") is important for your app,
> create another index on (emailok, emailbounced) which should help, of
> course if selectivity of your where clause is good enough (not to
> perform full table scan).
>
> Regards,
> Igor Neyman
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Steve Francis 2010-10-15 18:15:37 Re: Postgres log file
Previous Message Jessica Richard 2010-10-15 18:03:12 Postgres log file

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Francis 2010-10-15 18:15:37 Re: Postgres log file
Previous Message Jessica Richard 2010-10-15 18:03:12 Postgres log file