Problem with joining queries.

From: Konstantinos Krikellas <K(dot)Krikellas(at)sms(dot)ed(dot)ac(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Cc: Stratis Viglas <sviglas(at)inf(dot)ed(dot)ac(dot)uk>
Subject: Problem with joining queries.
Date: 2007-02-15 12:58:33
Message-ID: 1171544313.31678.22.camel@kkrik-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I am using PostgreSQL for benchmarking and I study the following query:

SELECT *
FROM k10_1, k10_2, k10_3, k10_4, k10_5
WHERE k10_1.times4 = k10_2.times4
AND k10_2.times4 = k10_3.times4
AND k10_3.times4 = k10_4.times4
AND k10_4.times4 = k10_5.times4

The used schema for all the tables is:

Column | Type | Modifiers
---------+---------------+-----------
unique1 | integer | not null
unique2 | integer |
times4 | integer |
times10 | integer |
ten | integer |
stringu | character(52) |
Indexes:
"k10_*_pkey" PRIMARY KEY, btree (unique1)

Each table has 10000 tuples of 72 bytes each (constant). The field
times4 in every table is valued in [0,2500), each value appearing
exactly four times but in random order. It is easy to extract that the
result has exactly 2,560,000 tuples or approximate size 185 MB. The
database has been 'VACUUM FULL'-ed and is static.

When I execute this query with EXPLAIN ANALYSE, the query is executed in
10-20 sec and consumes only 8Mb of memory, depending to the machine (I
have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2
Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute
exactly the same query normally and direct the output to /dev/null,
PostgreSQL consumes all the available memory (RAM and swap), and the
query cannot be executed, as I receive the message 'Out of memory'. The
same thing happened to all the machines. I have tried to adjust working
memory and shared buffers but it still performed in the same way.

Since this is not exactly an extreme query, as its input is 5 tables
with 10 thousands tuples and its output is 2.6 millions, it seems that a
problem exists in this case. I would like to pose the following
questions:

1. Why PostgreSQL fails to execute the query? Is there any parameter
that specifies when the buffer manager tries to store intermediate and
final results to the disc and how much of disk space it can occupy for
temporary results?

2. How does EXPLAIN ANALYSE work? Does it create all the intermediate
results as in the normal execution? Does it call the print function at
the final result and direct the output to /dev/null or it doesn't call
it at all? This is important as, if the cardinality of the final result
is high, the print function callls impose a significant penalty on
execution time

3. Is there any way of executing a query without materialising the final
result but only the intermediate results, if the query plans demand it?

I hope you can enlighten me with these questions.

Kind regards,
Konstantinos Krikellas
PhD student, Database Group
University of Edinburgh
Email: K(dot)Krikellas(at)sms(dot)ed(dot)ac(dot)uk
Pnone number: +44 (0) 131 651 3769

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2007-02-15 13:39:10 Re: Benchmarking PGSQL?
Previous Message Brian Herlihy 2007-02-15 01:30:46 Re: An unwanted seqscan