Re: Repeated execution of identical subqueries

From: Craig James <craig_james(at)emolecules(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Repeated execution of identical subqueries
Date: 2008-03-13 14:06:03
Message-ID: 47D934CB.6090606@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig Ringer wrote:
> Tom Lane wrote:
>> No, not at the moment. In principle the planner could look for such
>> duplicates, but it'd be wasted cycles so much of the time that I'd be
>> loath to do it.
>>
> Good point - there are better places to spend time, and I imagine it'd
> be an expensive thing to check too.

The one very simple case that gets me every time is when a user-defined function is called.

test=> explain analyze select chm_mf(isosmiles) from version where chm_mf(isosmiles) like '%C20%' or chm_mf(isosmiles) like '%C21%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on version (cost=0.00..18.57 rows=29 width=43) (actual time=48.798..1180.538 rows=50 loops=1)
Filter: ((chm_mf(isosmiles) ~~ '%C20%'::text) OR (chm_mf(isosmiles) ~~ '%C21%'::text))
Total runtime: 1180.683 ms
(3 rows)

This table only has 375 rows TOTAL, yet it takes over a second to answer this query: "Find me molecules with either 20 or 21 carbon atoms in the molecular formula". This is a somewhat contrived example, but we have others that really do get us, and we go to great lengths to avoid them. It requires us to avoid "natural" queries like the one above, and instead create convoluted application logic using temporary tables to hold the results of a function call, which we can then use in a query that uses the values more than once. Something like this:

create temporary table foo(mf text);
explain analyze insert into foo (select chm_mf(isosmiles) from version);
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on version (cost=0.00..15.69 rows=375 width=43) (actual time=1.829..566.233 rows=375 loops=1)
Total runtime: 568.470 ms

Now we can use this table in a complex query where we need the MF twice, and discard it, and it's STILL faster than the original "natural" SQL. Notice that it takes just half the time as the first query, which tells me chm_mf() was being called twice in the first example. This function is defined as:

CREATE OR REPLACE FUNCTION chm_mf(text) RETURNS text
AS '/usr/local/pgsql/lib/libchm.so', 'chm_mf'
LANGUAGE 'C' STRICT IMMUTABLE;

I can understand how in the general case, it is very hard to identify repeated subqueries. But it seems like an IMMUTABLE function shouldn't be called twice on the same column -- isn't that the whole point of IMMUTABLE?

Craig

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2008-03-13 14:29:23 Re: Benchmark: Dell/Perc 6, 8 disk RAID 10
Previous Message Justin Graf 2008-03-13 13:55:55 Re: Benchmark: Dell/Perc 6, 8 disk RAID 10