Quick Links

Re: parallel query evaluation

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	postgresql(at)os10000(dot)net
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: parallel query evaluation
Date:	2012-11-10 15:32:25
Message-ID:	9130.1352561545@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Oliver Seidel <postgresql(at)os10000(dot)net> writes:
> I have
> create table x ( att bigint, val bigint, hash varchar(30)
> );
> with 693million rows. The query

> create table y as select att, val, count(*) as cnt from x
> group by att, val;

> ran for more than 2000 minutes and used 14g memory on an 8g physical
> RAM machine

What was the plan for that query? What did you have work_mem set to?

I can believe such a thing overrunning memory if the planner chose to
use a hash-aggregation plan instead of sort-and-unique, but it would
only do that if it had made a drastic underestimate of the number of
groups implied by the GROUP BY clause. Do you have up-to-date
statistics for the source table?

regards, tom lane

In response to

parallel query evaluation at 2012-11-08 11:55:12 from Oliver Seidel

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Rafał Rzepecki	2012-11-11 03:18:31	Planner sometimes doesn't use a relevant index with IN (subquery) condition
Previous Message	Jeff Janes	2012-11-09 20:47:37	Re: [HACKERS] pg_dump and thousands of schemas