Re: query produces 1 GB temp file

From: Dirk(dot)Lutzebaeck(at)t-online(dot)de (Dirk Lutzebaeck)
To: John A Meinel <john(at)arbash-meinel(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: query produces 1 GB temp file
Date: 2005-02-05 19:46:20
Message-ID: 4205228C.3060601@aeccom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi John,

thanks very much for your analysis. I'll probably need to reorganize
some things.

Regards,

Dirk

John A Meinel wrote:

> Dirk Lutzebaeck wrote:
>
>> Hi,
>>
>> here is a query which produces over 1G temp file in pgsql_tmp. This
>> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
>> sort_mem and 320MB shared_mem.
>>
>> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
>> tables have been analyzed before.
>>
>> Can some please explain why the temp file is so huge? I understand
>> there are a lot of rows.
>>
>> Thanks in advance,
>>
>> Dirk
>>
>>
> ...
>
>> -> Nested Loop (cost=0.00..8346.73 rows=3 width=1361)
>> (actual time=34.104..18016.005 rows=703677 loops=1)
>>
>>
> Well, there is this particular query where it thinks there will only
> be 3 rows, but in fact there are 703,677 of them. And the previous line:
>
>> -> Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual
>> time=75357.448..75499.263 rows=22439 loops=1)
>>
>>
> Seem to indicate that after sorting you still have 22,439 rows, which
> then gets pared down again down to 1000.
>
> I'm assuming that the sort you are trying to do is extremely
> expensive. You are sorting 700k rows, which takes up too much memory
> (1GB), which forces it to create a temporary table, and write it out
> to disk.
>
> I didn't analyze it a lot, but you might get a lot better performance
> from doing a subselect, rather than the query you wrote.
>
> You are joining 4 tables (bi, en, df AS ft, es) I don't know which
> tables are what size. In the end, though, you don't really care about
> the en table or es tables (they aren't in your output).
>
> So maybe one of you subselects could be:
>
> where bi.en = (select en from es where es.co = bi.co and
> es.spec=122293729);
>
> I'm pretty sure the reason you need 1GB of temp space is because at
> one point you have 700k rows. Is it possible to rewrite the query so
> that it does more filtering earlier? Your distinct criteria seems to
> filter it down to 20k rows. So maybe it's possible to do some sort of
> a distinct in part of the subselect, before you start joining against
> other tables.
>
> If you have that much redundancy, you might also need to think of
> doing a different normalization.
>
> Just some thoughts.
>
> Also, I thought using the "oid" column wasn't really recommended,
> since in *high* volume databases they aren't even guaranteed to be
> unique. (I think it is a 32-bit number that rolls over.) Also on a
> database dump and restore, they don't stay the same, unless you take a
> lot of extra care that they are included in both the dump and the
> restore. I believe it is better to create your own "id" per table (say
> SERIAL or BIGSERIAL).
>
> John
> =:->
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2005-02-05 20:22:52 Re: query produces 1 GB temp file
Previous Message Tom Lane 2005-02-05 19:42:34 Re: GiST indexes and concurrency (tsearch2)