Re: oracle to psql migration - slow query in postgres

From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: oracle to psql migration - slow query in postgres
Date: 2010-10-15 14:51:48
Message-ID: 4CB86A84.1000101@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Samuel Gendler wrote:
>
>
> On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala
> <mladen(dot)gogala(at)vmsinfo(dot)com <mailto:mladen(dot)gogala(at)vmsinfo(dot)com>> wrote:
>
> If working with partitioning, be very aware that PostgreSQL
> optimizer has certain problems with partitions, especially with
> group functions. If you want speed, everything must be prefixed
> with partitioning column: indexes, expressions, joins. There is no
> explicit star schema and creating hash indexes will not buy you
> much, as a matter of fact, Postgres community is extremely
> suspicious of the hash indexes and I don't see them widely used.
> Having said that, I was able to solve the problems with my speed
> and partitioning.
>
>
> Could you elaborate on this, please? What do you mean by 'everythin
> must be prefixed with partitioning column?'
>
> --sam
If you have partitioned table part_tab, partitioned on the column
item_date and if there is a global primary key in Oracle, let's call it
item_id, then queries like "select * from part_tab where item_id=12345"
will perform worse than queries with item_date"

select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the
item_date column in the query above is not necessary, after all, the
item_id column is the primary key. However, with range scans you will
get much better results if you include the item_date column than if you
use combination of columns without. The term "prefixed indexes" is
borrowed from Oracle RDBMS and means that the beginning column in the
index is the column on which the table is partitioned. Oracle, as
opposed to Postgres, has global indexes, the indexes that span all
partitions. PostgreSQL only maintains indexes on each of the partitions
separately. Oracle calls such indexes "local indexes" and defines them
on the partitioned table level. Here is a brief and rather succinct
explanation of the terminology:

http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php

Of, course, there are other differences between Oracle partitioning and
PostgreSQL partitioning. The main difference is $10000/CPU.
I am talking from experience:

news=> \d moreover_documents
Table "moreover.moreover_documents"
Column | Type | Modifiers
----------------------+-----------------------------+-----------
document_id | bigint | not null
dre_reference | bigint | not null
headline | character varying(4000) |
author | character varying(200) |
url | character varying(1000) |
rank | bigint |
content | text |
stories_like_this | character varying(1000) |
internet_web_site_id | bigint | not null
harvest_time | timestamp without time zone |
valid_time | timestamp without time zone |
keyword | character varying(200) |
article_id | bigint | not null
media_type | character varying(20) |
source_type | character varying(20) |
created_at | timestamp without time zone |
autonomy_fed_at | timestamp without time zone |
language | character varying(150) |
Indexes:
"moreover_documents_pkey" PRIMARY KEY, btree (document_id)
Triggers:
insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH
ROW EXE
CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> desc moreover_documents
Name Null? Type
----------------------------------------- --------
----------------------------
DOCUMENT# NOT NULL NUMBER
DRE_REFERENCE NOT NULL NUMBER
HEADLINE VARCHAR2(4000)
AUTHOR VARCHAR2(200)
URL VARCHAR2(1000)
RANK NUMBER
CONTENT CLOB
STORIES_LIKE_THIS VARCHAR2(1000)
INTERNET_WEB_SITE# NOT NULL NUMBER
HARVEST_TIME DATE
VALID_TIME DATE
KEYWORD VARCHAR2(200)
ARTICLE_ID NOT NULL NUMBER
MEDIA_TYPE VARCHAR2(20)
CREATED_AT DATE
SOURCE_TYPE VARCHAR2(50)
PUBLISH_DATE DATE
AUTONOMY_FED_AT DATE
LANGUAGE VARCHAR2(150)

SQL>

I must say that it took me some time to get things right.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Merlin Moncure 2010-10-15 15:06:08 Re: oracle to psql migration - slow query in postgres
Previous Message Samuel Gendler 2010-10-15 08:12:54 Re: oracle to psql migration - slow query in postgres

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-10-15 15:06:08 Re: oracle to psql migration - slow query in postgres
Previous Message Devrim GÜNDÜZ 2010-10-15 08:36:26 Re: Slow count(*) again...