Quick Links

Re: [HACKERS] why not parallel seq scan for slow functions

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: [HACKERS] why not parallel seq scan for slow functions
Date:	2018-02-15 10:48:46
Message-ID:	CAFjFpReHQ4xaPYaaW82kdUDWfsUtoKn-1Z_GBLNnOU1a8UR8gg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner(). standard_join_search() would then set this as
reltarget of the topmost relation and every path created for it will
have that target, applying projection if needed. This way we avoid
calling generate_gather_path() at two places. Right now
generate_gather_path() seems to be the only thing benefitting from
this but what about FDWs and custom paths whose costs may change when
targetlist changes. For custom paths I am considering GPU optimization
paths. Also this might address Tom's worry, "But having to do that in
a normal code path
suggests that something's not right about the design ... "

Here are some comments on the patch.

+ /*
+ * Except for the topmost scan/join rel, consider gathering
+ * partial paths. We'll do the same for the topmost scan/join
This function only works on join relations. Mentioning scan rel is confusing.

index 6e842f9..5206da7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}

+ *
+ * Also, if this is the topmost scan/join rel (that is, the only baserel),
+ * we postpone this until the final scan/join targelist is available (see

Mentioning join rel here is confusing since we deal with base relations here.

+ bms_membership(root->all_baserels) != BMS_SINGLETON)

set_tablesample_rel_pathlist() is also using this method to decide whether
there are any joins in the query. May be macro-ize this and use that macro at
these two places?

- * for the specified relation. (Otherwise, add_partial_path might delete a
+ * for the specified relation. (Otherwise, add_partial_path might delete a

Unrelated change?

+ /* Add projection step if needed */
+ if (target && simple_gather_path->pathtarget != target)

If the target was copied someplace, this test will fail. Probably we want to
check containts of the PathTarget structure? Right now copy_pathtarget() is not
called from many places and all those places modify the copied target. So this
isn't a problem. But we can't guarantee that in future. Similar comment for
gather_merge path creation.

+ simple_gather_path = apply_projection_to_path(root,
+ rel,
+ simple_gather_path,
+ target);
+

Why don't we incorporate those changes in create_gather_path() by passing it
the projection target instead of rel->reltarget? Similar comment for
gather_merge path creation.

+ /*
+ * Except for the topmost scan/join rel, consider gathering
+ * partial paths. We'll do the same for the topmost scan/join rel

Mentioning scan rel is confusing here.

deallocate tenk1_count;
+explain (costs off) select ten, costly_func(ten) from tenk1;

verbose output will show that the parallel seq scan's targetlist has
costly_func() in it. Isn't that what we want to test here?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Re: [HACKERS] why not parallel seq scan for slow functions at 2018-02-14 10:37:20 from Amit Kapila

Responses

Re: [HACKERS] why not parallel seq scan for slow functions at 2018-02-15 14:17:54 from Amit Kapila
Re: [HACKERS] why not parallel seq scan for slow functions at 2018-03-11 12:19:08 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro HORIGUCHI	2018-02-15 10:58:57	Let's remove DSM_INPL_NONE.
Previous Message	Amit Langote	2018-02-15 10:31:47	Re: reorganizing partitioning code