Re: sub queries and caching.

From: mlw <markw(at)mohawksoft(dot)com>
To: Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sub queries and caching.
Date: 2001-07-22 18:30:15
Message-ID: 3B5B1BB7.687A2C73@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew McMillan wrote:
>
> mlw wrote:
> >
> > Take these queries:
> >
> > select * from foo as F, (select * from bar where name = 'bla') as B where
> > F.name = B.name
> > union all
> > select * from foo as F, (select * from bar where name = 'bla') as B where
> > F.type = B.type
> >
> > OR
> >
> > create temp table B as select * from bar where name = 'bla';
> > select * from foo as F, B where F.name = B.name
> > union all
> > select * from foo as F, B where F.type = B.type;
> > drop table B;
> >
> > My question is, which would be more efficient, or is it a wash?
> > (A note, I will be calling this from an external programming laguage, PHP, so
> > the first query would be one Postgres pq_exec call, while the second query
> > would be three separate calls.)
>
> The second could also be done as a single PHP call, given that you should be able to
> "create temp table ...; select ..." in a single pg_Exec call.
>
> You don't need a 'DROP TABLE B' because it's a temp table and will be dropped
> anyway, won't it, unless you're using pg_pconnect.

For a high volume website, where processing is done and latency are important
considerations.

Suppose, you have a few apache/php systems load balanced on top of a single
database system. (This is a very standard configuration.) The apache/php
machine cycles are cheaper than the database machine cycles because they can
usually be scaled easily. A database system is very difficult to scale. While
adding an apache/php box to this configuration is usually trivial, setting up a
database system across two or more machines is a hugely more complex problem.

Then there is latency, the longer a web pages takes to process, it holds its
resources longer, this means you will probably have more web server processes,
and if each process holds a database connection, you will probably have more
open database connections. So, latency costs you the ram on the local web
server and the resources on the back-end application service machines.

So the real trick to getting good salability is to reduce latency AND move as
much processing to the boxes which can be scaled.

Persistent connections to a database are vital in this scenario. Creation of a
new connection to a database impacts backend processing time and page latency.
So one has to drop the temporary table.

So, which is more expensive? Issuing the subquery multiple times within the
larger query, or creating a temporary table, performing the simpler query, and
then dropping the temp table.

My guess would be that creating the temp table and dropping it again do use
backend processing cycles. I wonder if PostgreSQL would be smart enough to
perform that query only once?

--
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-07-22 22:50:15 Re: Incomplete idea about views and INSERT...RETURNING
Previous Message Andrew McMillan 2001-07-22 18:02:42 Re: Incomplete idea about views and INSERT...RETURNING