Re: Why PG uses nested-loop join when no indexes are available?

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: David Grelaud <dgrelaud(at)ideolys(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why PG uses nested-loop join when no indexes are available?
Date: 2016-01-14 08:56:59
Message-ID: CAKJS1f8nsm-T0KMvGJz_bskUjQ=yGmGUUtUdAcFoEaZ_tuTXjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 14 January 2016 at 03:48, David Grelaud <dgrelaud(at)ideolys(dot)com> wrote:

> 3) Always avoid nested-loop join when no indexes are available?
>
> Tom Lane said "There might be some cases where this would help, but there
> would be many more where it would be useless or counterproductive."
> Who is right between Tom Lane and the Leis Viktor's paper above?
>
> We tried to disable nested_loop all the time in a production environment
> and we observed an overall improvement in all queries where Indexes are not
> useful or not available (CTEs), which confirms the paper.
> In fact, one of our production environment is still running with
> "nested_loop off" because benefits are a lot greater than drawbacks as long
> as some tables are relatively small (Indexes not used).
>

I don't really think any of them are wrong. Simply Tom is talking in
general terms for no specific workload, and the paper is dealing with one
specific workload. Of course there are cases when a non-parameterised
nested loop are the fastest way, I mean what could possibility be faster if
there's only 1 row to be joined, for example. It's just that it's not that
much faster since such a join is likely to perform very quickly no matter
which join algorithm is used.

On the other hand, if your tables are not tiny, or you're never just
joining to just a few rows, and you are suffering from stats
underestimations, then it's quite probable that you'll improve your
workload overall by doing enable_nestloop = off. But you have to remember
that if you do this, then you miss out on parameterised inner scans on
nested loops. Quite often these are the fastest option, even when the
number of rows is fairly large, as it might save building a hash table on a
very large relation, or having to sort that relation for a merge join.

Perhaps separating out enable_nestloop so that it only disables
non-parameterised nested loops, and add another GUC for parameterised
nested loops would be a good thing to do. Likely setting enable_nestloop to
off in production would be a slightly easier decision to make, if that was
the case.

It looks pretty simple to do this, so I hacked it up, and attached it here.
There's no doc changes and I'm not that interested in fighting for this
change, it's more just an idea for consideration.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
enable_paramnestloop.patch application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2016-01-14 08:59:05 Re: pg_dump problem with dropped NOT NULL on child table
Previous Message David G. Johnston 2016-01-14 05:31:33 Re: Function error