Re: query produces 1 GB temp file

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Dirk Lutzebaeck <lutzeb(at)aeccom(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: query produces 1 GB temp file
Date: 2006-10-27 15:49:28
Message-ID: 1161964168.1655.1517.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, 2005-02-05 at 11:25, Dirk Lutzebaeck wrote:
> Hi,
>
> here is a query which produces over 1G temp file in pgsql_tmp. This
> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
> sort_mem and 320MB shared_mem.

First step, upgrade to the latest 7.4.x version. 7.4.2 is an OLD
version of 7.4 I think the latest version is 7.4.13.

> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
> tables have been analyzed before.

SNIP

> EXPLAIN ANALYZE gives:
>
>
> Limit (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1)
> -> Unique (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1)
> -> Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
> Sort Key: ft.val_9, ft.created, ft.flatid
> -> Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1)
> -> Nested Loop (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563 loops=1)
> -> Nested Loop (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1)
> -> Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) (actual time=0.184..46.519 rows=5863 loops=1)
> Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
> -> Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218 rows=8 loops=5863)
> Index Cond: ("outer".en = bi.en)
> Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
> -> Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1 loops=48563)
> Index Cond: ("outer".en = en.oid)
> -> Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) (actual time=0.038..0.148 rows=14 loops=48563)
> Index Cond: ("outer".en = ft.en)
> Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
> Total runtime: 81782.052 ms
> (18 rows)

Why do you have an index scan on en_oid_index that thinks it will return
1 row when it returns 48563, and one on df_en that thinks it will return
49 and returns 48563 as well? Is this database analyzed often? Are
oids even analyzed? I'd really recommend switching off of them as they
complicate backups and restores.

If analyze doesn't help, you can try brute forcing off nested loops for
this query and see if that helps. nested loop is really slow for large
numbers of rows.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-10-27 18:07:43 Re: VACUUMs take twice as long across all nodes
Previous Message Dirk Lutzebäck 2006-10-27 14:51:25 Re: query produces 1 GB temp file