Skip site navigation (1) Skip section navigation (2)

Questions on query planner, join types, and work_mem

From: Peter Hussey <peter(at)labkey(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Questions on query planner, join types, and work_mem
Date: 2010-07-27 23:08:16
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
I have spent the last couple of weeks digging into a Postgres performance
problem that ultimately boiled down to this:  the planner was choosing to
use hash joins on a set of join keys that were much larger than the
configured work_mem.  We found we could make the  performance much better by
1) increasing work_mem to 500MB or more, or
2) forcing the planner to choose index-backed nested loops by turning off
hash and merge joins as well as bitmap and sequential scans.

Now we are trying to decide which of these paths to choose, and asking why
the planner doesn't handle this for us.

Background:  LabKey builds an open source platform for biomedical research
data.  The platform consists of a tomcat web application and a relational
database.  we support two databases, Postgres and SQL Server.  We started
with SQL Server because we were very familiar with it.  Two of our technical
team came from the SQL Server development team.  We chose Postgres because
we assessed that it was the open source database most likely to be able to
handle our application  requirements for capacity and complex, nested,
generated SQL handling.  Postgres is now the default database for our
platform and most of our key customers use it.  In general we've been very
satisfied with Postgres' performance and compatibility, but our customers
are starting to hit situations where we really need to be able to understand
why a particular operation is slow.  We are currently recommending version
8.4 and using that ourselves.

The core of the problem query was

SELECT * INTO snapshot_table FROM
  (SELECT ... FROM  tableA A LEFT  OUTER JOIN tableB B ON (A.lsid = B.lsid)
and A.datasetid = ? )  query1

the join column, lsid, is a poor choice for a join column as it is a long
varchar value (avg length 101 characters) that us only gets unique way out
on the right hand side.  But we are stuck with this choice.  I can post the
SQL query and table definitions if it will help, but changes to either of
those would be risky and difficult, whereas setting the work_mem value or
forcing nested loop joins is less risky.

The Performance curve looks something like this

Join Type      work_mem(MB)     time to populate snapshot (min)
Hash              50                        85
Hash              200                       38
Hash              400                       21
Hash              500                       12
Hash             1000                       12
NestedLoop        50                        15
NestedLoop        200                       11
NestedLoop        400                       11
NestedLoop        500                       10
NestedLoop       1000                       10

Table A contains about 3.5 million rows, and table B contains about 4.4
million rows.  By looking at the EXPLAIN ANALYZE reports I concluded that
the planner seemed to be accurately determining the approximate number of
rows returned on each side of the join node.  I also noticed that at the
work_mem = 50 test, the hash join query execution was using over a GB of
space in the pgsql_tmp, space that grew and shrank slowly over the course of
the test.

Now for the questions:
1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
postgres.config, what problems might they see?  the documentation and the
guidelines we received from Rupinder Singh in support suggest a much lower
value, e.g. a max work_mem of 10MB.  Other documentation such as the "Guide
to Posting Slow Query Questions" suggest at least testing up to 1GB.  What
is a reasonable maximum to configure for all connnections?

2) How is work_mem used by a query execution?  For example, does each hash
table in an execution get allocated a full work_mem's worth of memory ?   Is
this memory released when the query is finished, or does it stay attached to
the connection or some other object?

3) is there a reason why the planner doesn't seem to recognize the condition
when the hash table won't fit in the current work_mem, and choose a
low-memory plan instead?

Excuse the long-winded post; I was trying to give the facts and nothing but
the facts.

Peter Hussey
LabKey Software


pgsql-performance by date

Next:From: Andres FreundDate: 2010-07-27 23:57:41
Subject: Re: Questions on query planner, join types, and work_mem
Previous:From: Tom LaneDate: 2010-07-27 22:18:50
Subject: Re: Slow query using the Cube contrib module.

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group