Re: function not called if part of aggregate

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: function not called if part of aggregate
Date: 2006-06-13 06:26:26
Message-ID: 448E5A92.9050206@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Stark wrote:
> However that's not enough to explain what you've shown. How about you show the
> actual query and actual plan you're working with? The plan you've shown can't
> result from the query you sent.

Mea culpa, sort of. But ... in fact, the plan I sent *was* from query I sent, with the table/column names changed for clarity. This time I'll send the plan "raw". (This is PG 8.0.1.)

chm=> explain select count(1) from (select normalize_add_salt(smiles) from
chm(> salt_smiles order by db_no) as foo;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Aggregate (cost=69.95..69.95 rows=1 width=0)
-> Subquery Scan foo (cost=0.00..67.93 rows=806 width=0)
-> Index Scan using salt_smiles_pkey on salt_smiles (cost=0.00..59.87 rows=806 width=30)
(3 rows)

As pointed out by Tom and others, this query DOES in fact call the normalize_add_salt() function.

Now here's the weird part. (And where my original posting went wrong -- sorry for the error! I got the two queries mixed up.)

I originally had a more complex query, the purpose being to guarantee that the function was called on the strings in the order specified. (More on this below.) Here is the original query I used:

chm=> explain select count(1) from (select normalize_add_salt(smiles)
chm(> from (select smiles from salt_smiles order by db_no) as foo) as bar;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Aggregate (cost=67.94..67.94 rows=1 width=0)
-> Subquery Scan foo (cost=0.00..65.92 rows=806 width=0)
-> Index Scan using salt_smiles_pkey on salt_smiles (cost=0.00..57.86 rows=806 width=30)
(3 rows)

Notice that the plans are essentially identical, yet in this one the function does NOT get called. I proved this by brute force, inserting "char **p = NULL; *p = "foo";" into the C code to guarantee a segmentation violation if the function gets called. In the first case it does SIGSEGV, and in the second case it does not.

Now the reason for this more-complex query with an additional subselect is that the SMILES (which, by the way, are a lexical way of representing chemical structures - see www.daylight.com), must be passed to the function in a particular order (hence the ORDER BY). In retrospect I realize the optimizer apparently flattens this query anyway (hence the identical plans, above).

But the weird thing is that, in spite of flattening, which would appear to make the queries equivalent, the function gets called in one case, and not in the other.

Steinar H. Gunderson asked:
>> select count(1) from (select foo_init(value) from foo_init_table order by
>> value_id) as foo;
> Why not just count(foo_init(value))?

Because the SMILES must be processed in a specific order, hence the more complex queries.

The simple answer to this whole problem is what Steinar wrote:
>>This works well, but it requires me to actually retrieve the function's
>>value 800 times.
>
> Is this actually a problem?

No, it's just a nuisance. It occurs to me that in spite of the ORDER BY expression, Postgres is free to evaluate the function first, THEN sort the results, which means the SMILES would be processed in random order anyway. I.e. my ORDER BY clause is useless for the intended purpose.

So the only way I can see to get this right is to pull the SMILES into my application with the ORDER BY to ensure I have them in the correct order, then send them back one at a time via a "select normalize_add_salt(smiles)", meaning I'll retrieve 800 strings and then send them back.

I just thought there ought to be a way to do this all on the PG server instead of sending all these strings back and forth. I'd like to say to Postgres, "Just do it this way, OK?" But the optimizer can't be turned off, so I guess I have to do it the slow way. The good news is that this is just an initialization step, after which I typically process thousands of molecules, so the extra overhead won't kill me.

Thanks to all for your help.

Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steinar H. Gunderson 2006-06-13 06:30:59 Placement of 64-bit libraries (offtopic)
Previous Message J. Andrew Rogers 2006-06-13 06:00:38 Re: 64-bit vs 32-bit performance ... backwards?