Skip site navigation (1) Skip section navigation (2)

Re: oracle to psql migration - slow query in postgres

From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: oracle to psql migration - slow query in postgres
Date: 2010-10-15 14:51:48
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-adminpgsql-performance
Samuel Gendler wrote:
> On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala 
> <mladen(dot)gogala(at)vmsinfo(dot)com <mailto:mladen(dot)gogala(at)vmsinfo(dot)com>> wrote:
>      If working with partitioning, be very aware that PostgreSQL
>     optimizer has certain problems with partitions, especially with
>     group functions. If you want speed, everything must be prefixed
>     with partitioning column: indexes, expressions, joins. There is no
>     explicit star schema and creating hash indexes will not buy you
>     much, as a matter of fact, Postgres community is extremely
>     suspicious of the hash indexes and I don't see them widely used.
>     Having said that, I was able to solve the problems with my speed
>     and partitioning.
> Could you elaborate on this, please? What do you mean by 'everythin 
> must be prefixed with partitioning column?'
> --sam
If you have partitioned table part_tab, partitioned on the column 
item_date and if there is a global primary key in Oracle, let's call it 
item_id, then queries like "select * from part_tab where item_id=12345" 
will perform worse than queries with item_date"

select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the 
item_date column in the query above is not necessary, after all, the 
item_id column is the primary key. However, with range scans you will 
get much better results if you include the item_date column than if you 
use combination of columns without. The term "prefixed indexes" is 
borrowed from Oracle RDBMS and means that the beginning column in the 
index is the column on which the table is partitioned. Oracle, as 
opposed to Postgres, has global indexes, the indexes that span all 
partitions. PostgreSQL only maintains indexes on each of the partitions 
separately.  Oracle calls such indexes "local indexes" and defines them 
on the partitioned table level. Here is a brief and rather succinct  
explanation of the terminology:

Of, course, there are other differences between Oracle partitioning and 
PostgreSQL partitioning. The main difference is $10000/CPU.
I am talking from experience:

news=> \d moreover_documents
              Table "moreover.moreover_documents"
        Column        |            Type             | Modifiers
 document_id          | bigint                      | not null
 dre_reference        | bigint                      | not null
 headline             | character varying(4000)     |
 author               | character varying(200)      |
 url                  | character varying(1000)     |
 rank                 | bigint                      |
 content              | text                        |
 stories_like_this    | character varying(1000)     |
 internet_web_site_id | bigint                      | not null
 harvest_time         | timestamp without time zone |
 valid_time           | timestamp without time zone |
 keyword              | character varying(200)      |
 article_id           | bigint                      | not null
 media_type           | character varying(20)       |
 source_type          | character varying(20)       |
 created_at           | timestamp without time zone |
 autonomy_fed_at      | timestamp without time zone |
 language             | character varying(150)      |
    "moreover_documents_pkey" PRIMARY KEY, btree (document_id)
    insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH 
CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:

Connected to:
Oracle Database 10g Enterprise Edition Release - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> desc moreover_documents
 Name                       Null?    Type
 ----------------------------------------- -------- 
 DOCUMENT#                   NOT NULL NUMBER
 HEADLINE                        VARCHAR2(4000)
 AUTHOR                         VARCHAR2(200)
 URL                            VARCHAR2(1000)
 RANK                            NUMBER
 CONTENT                        CLOB
 STORIES_LIKE_THIS                    VARCHAR2(1000)
 HARVEST_TIME                        DATE
 VALID_TIME                        DATE
 KEYWORD                        VARCHAR2(200)
 ARTICLE_ID                   NOT NULL NUMBER
 MEDIA_TYPE                        VARCHAR2(20)
 CREATED_AT                        DATE
 SOURCE_TYPE                        VARCHAR2(50)
 PUBLISH_DATE                        DATE
 AUTONOMY_FED_AT                    DATE
 LANGUAGE                        VARCHAR2(150)


I must say that it took me some time to get things right.

Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251 
The Leader in Integrated Media Intelligence Solutions

In response to

pgsql-performance by date

Next:From: Merlin MoncureDate: 2010-10-15 15:06:08
Subject: Re: oracle to psql migration - slow query in postgres
Previous:From: Devrim GÜNDÜZDate: 2010-10-15 08:36:26
Subject: Re: Slow count(*) again...

pgsql-admin by date

Next:From: Merlin MoncureDate: 2010-10-15 15:06:08
Subject: Re: oracle to psql migration - slow query in postgres
Previous:From: Samuel GendlerDate: 2010-10-15 08:12:54
Subject: Re: oracle to psql migration - slow query in postgres

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group