Re: external sort performance

From: Craig James <craig_james(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: external sort performance
Date: 2011-11-17 17:28:15
Message-ID: 4EC5442F.9060402@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 11/17/11 9:10 AM, Jon Nelson wrote:
> I have one query which does not run very often. Sometimes it may be
> months between runs.
> However, when it does get executed, it scans approximately 100
> identically-structured tables (a form of partitioning), extracts and
> groups on a subset of the columns, and creates a new table. The
> individual table queries have no where clauses, this is a full table
> scan for every table.
>
> I've tried all sorts of things to try to improve the performance,
> which can take a /very/ long time.
> We are talking about approximately 175GB of data before grouping/summarizing.
>
> This is on PG 8.4.8 on Linux, 16GB of "real" RAM.
> Most recently, I enabled trace_sort, disabled hash aggregation[1], and
> set a large work_mem (normally very small, in this case I tried
> anything from 8MB to 256MB. I even tried 1GB and 2GB).
>
> In the logs, I saw this:
>
> external sort ended, 7708696 disk blocks used: CPU 359.84s/57504.66u
> sec elapsed 58966.76 sec
>
> Am I to understand that the CPU portion of the sorting only took 6
> minutes but the sort itself took almost 16.5 hours and used approx
> 60GB of disk space?
> The resulting summary table is about 5GB in size as reported by \d+ in
> psql (and pg_relation_size).
>
> The underlying storage is ext4 on a hardware raid 10 with a BBU.
>
> What sorts of things should I be looking at to improve the performance
> of this query? Is my interpretation of that log line totally off base?
You don't give any details about how and why you are sorting. Are you actually using all of the columns in your aggregated-data table in the sort operation? Or just a few of them?

You're making the sort operation work with 175 GB of data. If most of that data is only needed for the report (not the sort), then separate it into two tables - one of just the data that the sorting/grouping needs, and the other with the rest of the data. Then create a view that joins it all back together for reporting purposes.

Craig

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-11-17 17:55:20 Re: external sort performance
Previous Message Claudio Freire 2011-11-17 17:16:59 Re: external sort performance