Re: [PATCH] Push limit to sort through a subquery

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Push limit to sort through a subquery
Date: 2017-08-25 16:05:22
Message-ID: 3207.1503677122@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I had the same thought. Done in the attached revision of your
> version. I couldn't consistently reproduce the failure you saw -- it
> failed for me only about 1 time in 10 -- but I don't think it's
> failing any more with this version.

Hm. It seemed pretty reliable for me, but we already know that the
parallel machinery is quite timing-sensitive. I think we'd better
incorporate a regression test case into the committed patch so we
can see what the buildfarm thinks. I'll see about adding one.

> I think that the comment at the bottom of ExecSetTupleBound should
> probably be rethought a bit in light of these changes.

Agreed; I'll revisit all the comments, actually.

> The details aside, Konstantin Knizhnik's proposal to teach this code
> about ForeignScans is yet another independent way for this to win, and
> I think that will also be quite a large win.

+1

> It's possible that we should work out some of these things at plan
> time rather than execution time, and it's also possible that a
> separate call to ExecSetTupleBound is too much work. I've mulled over
> the idea of whether we should get rid of this mechanism altogether in
> favor of passing the tuple bound as an additional argument to
> ExecProcNode, because it seems silly to call node-specific code once
> to set a (normally small, possibly 1) bound and then call it again to
> get a (or the) tuple.

No, I think that's fundamentally the wrong idea. The tuple bound
inherently is a piece of information that has to apply across a whole
scan. If you pass it to ExecProcNode then you'll have to get into
API contracts about whether it's allowed to change across calls,
which is a mess, even aside from efficiency concerns (and for
ExecProcNode, I *am* concerned about efficiency).

You could imagine adding it as a parameter to ExecReScan, instead, but
then there are a bunch of issues with how that would interact with the
existing optimizations for skipped/delayed/merged rescan calls (cf.
the existing open issue with parallel rescans). That is a can of worms
I'd just as soon not open.

I'm content to leave it as a separate call. I still think that worrying
about extra C function calls for this purpose is, at best, premature
optimization.

I'll do another pass over the patch and get back to you. I've not
looked at the instrumentation patch yet --- is there a reason to
worry about it before we finish this one?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-25 16:16:30 Re: Proposal: global index
Previous Message Robert Haas 2017-08-25 15:43:51 Re: [PATCH] Push limit to sort through a subquery