Re: parallel query evaluation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: postgresql(at)os10000(dot)net
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: parallel query evaluation
Date: 2012-11-10 15:32:25
Message-ID: 9130.1352561545@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Oliver Seidel <postgresql(at)os10000(dot)net> writes:
> I have
> create table x ( att bigint, val bigint, hash varchar(30)
> );
> with 693million rows. The query

> create table y as select att, val, count(*) as cnt from x
> group by att, val;

> ran for more than 2000 minutes and used 14g memory on an 8g physical
> RAM machine

What was the plan for that query? What did you have work_mem set to?

I can believe such a thing overrunning memory if the planner chose to
use a hash-aggregation plan instead of sort-and-unique, but it would
only do that if it had made a drastic underestimate of the number of
groups implied by the GROUP BY clause. Do you have up-to-date
statistics for the source table?

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rafał Rzepecki 2012-11-11 03:18:31 Planner sometimes doesn't use a relevant index with IN (subquery) condition
Previous Message Jeff Janes 2012-11-09 20:47:37 Re: [HACKERS] pg_dump and thousands of schemas