Re: [PATCH] Push limit to sort through a subquery

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Push limit to sort through a subquery
Date: 2017-08-25 16:27:18
Message-ID: CA+TgmoYR8GrOA=CEtD0MvDoxJ_xkMuQFAHVUV=R7ZrU2pxcp0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 25, 2017 at 12:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hm. It seemed pretty reliable for me, but we already know that the
> parallel machinery is quite timing-sensitive. I think we'd better
> incorporate a regression test case into the committed patch so we
> can see what the buildfarm thinks. I'll see about adding one.

That would be great. I didn't have a great idea how to write a test
for this -- maybe one of my systematic failures as a developer. :-(

>> I think that the comment at the bottom of ExecSetTupleBound should
>> probably be rethought a bit in light of these changes.
>
> Agreed; I'll revisit all the comments, actually.

That also sounds great.

> No, I think that's fundamentally the wrong idea. The tuple bound
> inherently is a piece of information that has to apply across a whole
> scan. If you pass it to ExecProcNode then you'll have to get into
> API contracts about whether it's allowed to change across calls,
> which is a mess, even aside from efficiency concerns (and for
> ExecProcNode, I *am* concerned about efficiency).

Fair.

> You could imagine adding it as a parameter to ExecReScan, instead, but
> then there are a bunch of issues with how that would interact with the
> existing optimizations for skipped/delayed/merged rescan calls (cf.
> the existing open issue with parallel rescans). That is a can of worms
> I'd just as soon not open.
>
> I'm content to leave it as a separate call. I still think that worrying
> about extra C function calls for this purpose is, at best, premature
> optimization.

OK, you may well be right. Outside of Limit, I suppose we're not
likely to set any limit other than 1, and if we set a limit of 1 it'll
probably be a limit that survives rescans, because it'll be based on
the parent node not caring about anything more than the first tuple.

BTW, I think another reason why we're likely to eventually want to get
very aggressive about passing down bounds is batch execution. That's
somewhere on Andres's list of things that could speed up the executor,
although I believe at present there are quite a few other bottlenecks
that are more urgent. But certainly if we start doing that, then even
things like SeqScan are going to want to know about applicable limit
clauses.

> I'll do another pass over the patch and get back to you. I've not
> looked at the instrumentation patch yet --- is there a reason to
> worry about it before we finish this one?

I think they're independent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-25 16:38:37 Re: why not parallel seq scan for slow functions
Previous Message Robert Haas 2017-08-25 16:16:30 Re: Proposal: global index