Re: Controlling complexity in queries

From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, David Johnston <polobo(at)yahoo(dot)com>, Robert James <srobertjames(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Controlling complexity in queries
Date: 2011-12-13 22:27:36
Message-ID: 4EE7D158.7070109@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Merlin Moncure wrote:
> Breaking your large queries into functions OTOH can make significant
> changes to the plan, often to the worse.

As an end-user, I think this is an area where PostgreSQL could really stand
out (that and the moon launch). In Rails-land, you don't have The DBA that
writes queries. You have a developer and an ORM, and when they step outside
that ORM to do the cool things SQL can do, all their DRY habits fall apart,
because it's 1990 again and you can either write clear code or fast code but
not both.

Postgres has made huge inroads with its optimizer. A few years ago, for web
startups picking their relational DB, the performant answer was "MySQL for
fast, simple, mostly-read queries, PostgreSQL when you grow up." That's
become "Probably Postgres", and with 9.2's COUNT(*), it's become "Definitely
Postgres".

But having to write one big query for performance feels exactly like having
to write one big C function with unrolled loops. I'm currently taking a
well-factored, function-based query and turning it INTO what Robert James is
trying to get OUT of: a monolithic query.

> In the end, the performance of your queries is going to be directly
> related to how well you map the problem into relational logic

It's not just that, though; it's quite possible to think relationally and
still fall down. There are plenty of cases where the human eye can see that
a modular function can be inlined, but the optimizer can't. I have a
pathological case: a query against a database with just a few thousand users
takes 1.5 seconds on fast hardware, because it ends up scanning a cartesian
product to get 16 rows, even before you get to the nested loops. In fact,
most of the time the optimizer does a great job of inlining all my
set-returning functions, once 9.0.6/9.1.2 rolled out.

I've seen at least three equally ominous pieces that would have to happen to
allow DRY, composable SQL:

1. Optional optimization of non-recursive WITH
2. Optional pushdown of WHERE clauses into GROUP BY[1]
3. LATERAL

AFAIK, none of these are on anyone's short-term to-do list, and I'm sure
none are easy.

[1] Since this is my current favorite problem, the pathological case is:

select questions.id
from questions
join (
select u.id
from users as u
group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

With users.id as a primary key, it's obvious that this can return only one
row, but it has to scan the users table to get there. See the "Subjquery in
a JOIN not getting restricted?" thread on pgsql-performance for Tom's
explanation of why that's a hard problem to solve.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2011-12-13 23:22:03 Re: Hope for a new PostgreSQL era?
Previous Message anisoptera 2011-12-13 21:03:01 Re: query for all values linked to a field