Re: Self-join optimisation

From: Hywel Carver <hywel(at)skillerwhale(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Self-join optimisation
Date: 2021-03-12 09:20:18
Message-ID: CAFcA2FZEw-NipsmeUW2dk-xtsT-bQSaq13h2=vKykKabfZ55EQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, thanks for your replies. I've tested the patch and it works for me in
the cases where I'd use it.

> And explain why your application is doing queries like this, and why it
can't be changed to changed to not generate such queries.

Reading the thread, it looks like some of the requests for this feature are
coming from people using ORMs that generate bad queries. That's not been my
experience - I've always been able to find a way to construct the right
query through the ORM or just write correct SQL. When I've wanted this
feature has always been in relation to combining views.

For example, I was recently helping out a company that runs a booking
system for leisure activities, and their database has a view for how many
staff are available on a given day to supervise a given activity (e.g.
surfing), and a view for how much equipment is available on a given day
(e.g. how many surfboards). They also have a view for the equipment
requirements for a given activity (e.g. some boat trips require a minimum
of 2 boats and 4 oars). When they want to make bookings, they have to
combine data from these views, and the tables that create them.

It would definitely be possible to write one view that had all of this data
in (and maintain the other views too, which are needed elsewhere in the
site). And it could be made wide to have all of the rows from the source
tables. But it would, to me, feel like much better code to keep the
separate decomposed views and join them together for the booking query.
Right now, that query's performance suffers in a way that this feature
would fix. So the current choices are: accept worse performance with
decomposed views, write one very large and performant view but duplicate
some of the logic, or use their ORM to generate the SQL that they'd
normally put in a view.

On Thu, Mar 11, 2021 at 10:50 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

> On 3/11/21 3:39 PM, Hywel Carver wrote:
> > Great! It looks like it's been in commitfest status for a few years. Is
> > there anything someone like me (outside the pgsql-hackers community) can
> > do to help it get reviewed this time around?
> >
>
> Well, you could do a review, or at least test it with the queries your
> application is actually running. And explain why your application is
> doing queries like this, and why it can't be changed to changed to not
> generate such queries.
>
> The first couple of messages from the patch thread [1] (particularly the
> messages from May 2018) are a good explanation why patches like this are
> tricky to get through.
>
> The basic assumption is that such queries are a bit silly, and it'd be
> probably easier to modify the application not to generate them instead
> of undoing the harm in the database planner. The problem is this makes
> the planner more expensive for everyone, including people who carefully
> write "good" queries.
>
>
> And we don't want to do that, so we need to find a way to make this
> optimization very cheap (essentially "free" if not applicable), but
> that's difficult because there may be cases where the self-joins are
> intentional, and undoing them would make the query slower. And doing
> good decision requires enough information, but this decision needs to
> happen quite early in the planning, when we have very little info.
>
> So while it seems like a simple optimization, it's actually quite tricky
> to get right.
>
> (Of course, there are cases where you may get such queries even if you
> try writing good SQL, say when joining views etc.)
>
> regards
>
> [1]
>
> https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3(at)postgrespro(dot)ru
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-03-12 09:31:39 RE: Parallel INSERT (INTO ... SELECT ...)
Previous Message Laurenz Albe 2021-03-12 09:16:18 Re: documentation fix for SET ROLE