Questions on query planner, join types, and work_mem

From: Peter Hussey <peter(at)labkey(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Questions on query planner, join types, and work_mem
Date: 2010-07-27 23:08:16
Message-ID: AANLkTikMwVZKDYLitnsK31R=VNOuTH8NtmDHDGkBR7k8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have spent the last couple of weeks digging into a Postgres performance
problem that ultimately boiled down to this: the planner was choosing to
use hash joins on a set of join keys that were much larger than the
configured work_mem. We found we could make the performance much better by
either
1) increasing work_mem to 500MB or more, or
2) forcing the planner to choose index-backed nested loops by turning off
hash and merge joins as well as bitmap and sequential scans.

Now we are trying to decide which of these paths to choose, and asking why
the planner doesn't handle this for us.

Background: LabKey builds an open source platform for biomedical research
data. The platform consists of a tomcat web application and a relational
database. we support two databases, Postgres and SQL Server. We started
with SQL Server because we were very familiar with it. Two of our technical
team came from the SQL Server development team. We chose Postgres because
we assessed that it was the open source database most likely to be able to
handle our application requirements for capacity and complex, nested,
generated SQL handling. Postgres is now the default database for our
platform and most of our key customers use it. In general we've been very
satisfied with Postgres' performance and compatibility, but our customers
are starting to hit situations where we really need to be able to understand
why a particular operation is slow. We are currently recommending version
8.4 and using that ourselves.

The core of the problem query was

SELECT * INTO snapshot_table FROM
(SELECT ... FROM tableA A LEFT OUTER JOIN tableB B ON (A.lsid = B.lsid)
and A.datasetid = ? ) query1

the join column, lsid, is a poor choice for a join column as it is a long
varchar value (avg length 101 characters) that us only gets unique way out
on the right hand side. But we are stuck with this choice. I can post the
SQL query and table definitions if it will help, but changes to either of
those would be risky and difficult, whereas setting the work_mem value or
forcing nested loop joins is less risky.

The Performance curve looks something like this

Join Type work_mem(MB) time to populate snapshot (min)
______________________________________________________________
Hash 50 85
Hash 200 38
Hash 400 21
Hash 500 12
Hash 1000 12
_______________________________________________________________
NestedLoop 50 15
NestedLoop 200 11
NestedLoop 400 11
NestedLoop 500 10
NestedLoop 1000 10
________________________________________________________

Table A contains about 3.5 million rows, and table B contains about 4.4
million rows. By looking at the EXPLAIN ANALYZE reports I concluded that
the planner seemed to be accurately determining the approximate number of
rows returned on each side of the join node. I also noticed that at the
work_mem = 50 test, the hash join query execution was using over a GB of
space in the pgsql_tmp, space that grew and shrank slowly over the course of
the test.

Now for the questions:
1) If we tell the customer to set his work_mem value to 500MB or 1GB in
postgres.config, what problems might they see? the documentation and the
guidelines we received from Rupinder Singh in support suggest a much lower
value, e.g. a max work_mem of 10MB. Other documentation such as the "Guide
to Posting Slow Query Questions" suggest at least testing up to 1GB. What
is a reasonable maximum to configure for all connnections?

2) How is work_mem used by a query execution? For example, does each hash
table in an execution get allocated a full work_mem's worth of memory ? Is
this memory released when the query is finished, or does it stay attached to
the connection or some other object?

3) is there a reason why the planner doesn't seem to recognize the condition
when the hash table won't fit in the current work_mem, and choose a
low-memory plan instead?

Excuse the long-winded post; I was trying to give the facts and nothing but
the facts.

Thanks,
Peter Hussey
LabKey Software

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2010-07-27 23:57:41 Re: Questions on query planner, join types, and work_mem
Previous Message Tom Lane 2010-07-27 22:18:50 Re: Slow query using the Cube contrib module.