Re: [HACKERS] why not parallel seq scan for slow functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Date: 2018-03-28 21:01:20
Message-ID: CA+TgmobwhrBcYNfo2y6sddUS4vgkaUs4M0vkdXxLTbi8+f-fmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 28, 2018 at 3:06 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Good idea, such an optimization will ensure that the cases reported
> above will not have regression. However isn't it somewhat beating the
> idea you are using in the patch which is to avoid modifying the path
> in-place?

Sure, but you can't have everything. I don't think modifying the
sortgroupref data in place is really quite the same thing as changing
the pathtarget in place; the sortgroupref stuff is some extra
information about the targets being computed, not really a change in
targets per se. But in any case if you want to eliminate extra work
then we've gotta eliminate it...

> In any case, I think one will still see regression in cases
> where this optimization doesn't apply. For example,
>
> DO $$
> DECLARE count integer;
> BEGIN
> For count In 1..1000000 Loop
> Execute 'explain Select sum(thousand)from tenk1 group by ten';
> END LOOP;
> END;
> $$;
>
> The above block takes 43700.0289 ms on Head and 45025.3779 ms with the
> patch which is approximately 3% regression.

Thanks for the analysis -- the observation that this seemed to affect
cases where CP_LABEL_TLIST gets passed to create_projection_plan
allowed me to recognize that I was doing an unnecessary copyObject()
call. Removing that seems to have reduced this regression below 1% in
my testing.

Also, keep in mind that we're talking about extremely small amounts of
time here. On a trivial query that you're not even executing, you're
seeing a difference of (45025.3779 - 43700.0289)/1000000 = 0.00132 ms
per execution. Sure, it's still 3%, but it's 3% of the time in an
artificial case where you don't actually run the query. In real life,
nobody runs EXPLAIN in a tight loop a million times without ever
running the query, because that's not a useful thing to do. The
overhead on a realistic test case will be smaller. Furthermore, at
least in my testing, there are now cases where this is faster than
master. Now, I welcome further ideas for optimization, but a patch
that makes some cases slightly slower while making others slightly
faster, and also improving the quality of plans in some cases, is not
to my mind an unreasonable thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0003-Rewrite-the-code-that-applies-scan-join-targets-to-p.patch application/octet-stream 62.8 KB
0002-Postpone-generate_gather_paths-for-topmost-scan-join.patch application/octet-stream 6.2 KB
0001-Teach-create_projection_plan-to-omit-projection-wher.patch application/octet-stream 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-03-28 21:09:07 Re: reorganizing partitioning code
Previous Message David G. Johnston 2018-03-28 20:32:46 Re: csv format for psql