Re: inherit support for foreign tables

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: inherit support for foreign tables
Date: 2014-12-10 03:18:53
Message-ID: 5487BB9D.7070605@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

Thanks for the review!

(2014/11/28 18:14), Ashutosh Bapat wrote:
> On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>> wrote:
> (2014/11/17 17:55), Ashutosh Bapat wrote:
> Here are my review comments for patch fdw-inh-3.patch.

> Tests
> -------
> 1. It seems like you have copied from testcase inherit.sql to
> postgres_fdw testcase. That's a good thing, but it makes the
> test quite
> long. May be we should have two tests in postgres_fdw contrib
> module,
> one for simple cases, and other for inheritance. What do you say?

> IMO, the test is not so time-consuming, so it doesn't seem worth
> splitting it into two.

> I am not worried about the timing but I am worried about the length of
> the file and hence ease of debugging in case we find any issues there.
> We will leave that for the commiter to decide.

OK

> Documentation
> --------------------
> 1. The change in ddl.sgml
> - We will refer to the child tables as partitions, though
> they
> - are in every way normal <productname>PostgreSQL</> tables.
> + We will refer to the child tables as partitions, though
> we assume
> + that they are normal <productname>PostgreSQL</> tables.
>
> adds phrase "we assume that", which confuses the intention
> behind the
> sentence. The original text is intended to highlight the equivalence
> between "partition" and "normal table", where as the addition
> esp. the
> word "assume" weakens that equivalence. Instead now we have to
> highlight
> the equivalence between "partition" and "normal or foreign
> table". The
> wording similar to "though they are either normal or foreign tables"
> should be used there.

> You are right, but I feel that there is something out of place in
> saying that there (5.10.2. Implementing Partitioning) because the
> procedure there has been written based on normal tables only. Put
> another way, if we say that, I think we'd need to add more docs,
> describing the syntax and/or the corresponding examples for
> foreign-table cases. But I'd like to leave that for another patch.
> So, how about the wording "we assume *here* that", instead of "we
> assume that", as I added the following notice in the previous
> section (5.10.1. Overview)?
>
> @@ -2650,7 +2669,10 @@ VALUES ('Albany', NULL, NULL, 'NY');
> table of a single parent table. The parent table itself is
> normally
> empty; it exists just to represent the entire data set. You
> should be
> familiar with inheritance (see <xref linkend="ddl-inherit">)
> before
> - attempting to set up partitioning.
> + attempting to set up partitioning. (The setup and management of
> + partitioned tables illustrated in this section assume that each
> + partition is a normal table. However, you can do that in a
> similar way
> + for cases where some or all partitions are foreign tables.)

> This looks ok, though, I would like to see final version of the
> document. But I think, we will leave that for committer to handle.

OK

> 2. The wording "some kind of optimization" gives vague picture.
> May be
> it can be worded as "Since the constraints are assumed to be
> true, they
> are used in constraint-based query optimization like constraint
> exclusion for partitioned tables.".
> + Those constraints are used in some kind of query
> optimization such
> + as constraint exclusion for partitioned tables (see
> + <xref linkend="ddl-partitioning">).

> Will follow your revision.

Done.

> Code
> -------
> 1. In the following change
> +/*
> * acquire_inherited_sample_rows -- acquire sample rows from
> inheritance tree
> *
> * This has the same API as acquire_sample_rows, except that
> rows are
> * collected from all inheritance children as well as the
> specified table.
> - * We fail and return zero if there are no inheritance children.
> + * We fail and return zero if there are no inheritance children or
> there are
> + * inheritance children that foreign tables.
>
> The addition should be "there are inheritance children that *are all
> *foreign tables. Note the addition "are all".

> Sorry, I incorrectly wrote the comment. What I tried to write is
> "We fail and return zero if there are no inheritance children or if
> we are not in VAC_MODE_SINGLE case and inheritance tree contains at
> least one foreign table.".

> You might want to use "English" description of VAC_MODE_SINGLE instead
> of that macro in the comment, so that reader doesn't have to look up
> VAC_MODE_SINGLE. But I think, we will leave this for the committer.

I corrected the comments and translated the macro into the English
description.

> 2. The function has_foreign() be better named
> has_foreign_child()? This

> How about "has_foreign_table"?

> has_foreign_child() would be better, since these are "children" in the
> inheritance hierarchy and not mere "table"s.

Done. But I renamed it to has_foreign_children() because it sounds more
natural at least to me.

> function loops through all the tableoids passed even after it
> has found
> a foreign table. Why can't we return true immediately after
> finding the
> first foreign table, unless the side effects of heap_open() on
> all the
> table are required. But I don't see that to be the case, since these
> tables are locked already through a previous call to
> heap_open(). In the

> Good catch! Will fix.

> same function instead of argument name parentrelId may be we
> should use
> name parent_oid, so that we use same notation for parent and
> child table
> OIDs.

> Will fix.

Done.

> 3. Regarding enum VacuumMode - it's being used only in case of
> acquire_inherited_sample_rows(__) and that too, to check only a
> single
> value of the three defined there. May be we should just infer
> that value
> inside acquire_inherited_sample_rows(__) or pass a boolean true
> or false
> from do_analyse_rel() based on the VacuumStmt. I do not see need
> for a
> separate three value enum of which only one value is used and
> also to
> pass it down from vacuum() by changing signatures of the minion
> functions.

> I introduced that for possible future use. See the discussion in [1].

> Will leave it for the commiter to decide.

I noticed that the signatures need not to be modified, as you said.
Thanks for pointing that out! So, I revised the patch not to change the
signatures, though I left the enum, renaming it to AnalyzeMode. Let's
have the committer's review.

> 4. In postgresGetForeignPlan(), the code added by this patch is
> required
> to handle the case when the row mark is placed on a parent table and
> hence is required to be applied on the child table. We need a
> comment
> explaining this. Otherwise, the three step process to get the
> row mark
> information isn't clear for a reader.

> Will add the comment.

Done.

> 5. In expand_inherited_rtentry() why do you need a separate variable
> hasForeign? Instead of using that variable, you can actually
> set/reset
> rte->hasForeign since existence of even a single foreign child would
> mark that member as true. - After typing this, I got the answer
> when I
> looked at the function code. Every child's RTE is initially a
> copy of
> parent's RTE and hence hasForeign status would be inherited by every
> child after the first foreign child. I think, this reasoning
> should be
> added as comment before assignment to rte->hasForeign at the end
> of the
> function.

> As you mentioned, I think we could set rte->hasForeign directly
> during the scan for the inheritance set, without the separate
> variable, hasForeign. But ISTM that it'd improve code readability
> to set rte->hasForeign using the separate variable at the end of the
> function because rte->hasForeign has its meaning only when rte->inh
> is true and because we know whether rte->inh is true, at the end of
> the function.

> Fine. Please use "hasForeignChild" instead of just "hasForeign" without
> a clue as to what is "foreign" here.

Done. But I renamed it to "hasForeignChildren".

> 6. The tests in foreign_data.sql are pretty extensive. Thanks
> for that.
> I think, we should also check the effect of each of the following
> command using \d on appropriate tables.
> +CREATE FOREIGN TABLE ft2 () INHERITS (pt1)
> + SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted"
> 'value');
> +ALTER FOREIGN TABLE ft2 NO INHERIT pt1;
> +DROP FOREIGN TABLE ft2;
> +CREATE FOREIGN TABLE ft2 (
> + c1 integer NOT NULL,
> + c2 text,
> + c3 date
> +) SERVER s0 OPTIONS (delimiter ',', quote '"', "be quoted"
> 'value');
> +ALTER FOREIGN TABLE ft2 INHERIT pt1;

> Will fix.

Done.

> Apart from the above, I noticed that the patch doesn't consider to
> call ExplainForeignModify during EXPLAIN for an inherited
> UPDATE/DELETE, as shown below (note that there are no UPDATE remote
> queries displayed):

Since there seems to be no objections, I updated the patch as proposed
upthread. Here is an example.

postgres=# explain (format text, verbose) update parent as p set a = a *
2 returning *;
QUERY PLAN
--------------------------------------------------------------------------------
Update on public.parent p (cost=0.00..202.33 rows=11 width=10)
Output: p.a
For public.ft1 p_1
Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
RETURNING a
For public.ft2 p_2
Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1
RETURNING a
-> Seq Scan on public.parent p (cost=0.00..0.00 rows=1 width=10)
Output: (p.a * 2), p.ctid
-> Foreign Scan on public.ft1 p_1 (cost=100.00..101.16 rows=5
width=10)
Output: (p_1.a * 2), p_1.ctid
Remote SQL: SELECT a, ctid FROM public.mytable_1 FOR UPDATE
-> Foreign Scan on public.ft2 p_2 (cost=100.00..101.16 rows=5
width=10)
Output: (p_2.a * 2), p_2.ctid
Remote SQL: SELECT a, ctid FROM public.mytable_2 FOR UPDATE
(14 rows)

Other changes:
* revised regression tests for contrib/file_fdw to refer to tableoid.
* revised docs a bit further.

Attached are updated patches. Patch fdw-inh-5.patch has been created on
top of patch fdw-chk-5.patch. Patch fdw-chk-5.patch is basically the
same as the previous one fdw-chk-4.patch, but I slightly modified that
one. The changes are the following.
* added to foreign_data.sql more tests for your comments.
* revised docs on ALTER FOREIGN TABLE a bit further.

Thanks,

Best regards,
Etsuro Fujita

Attachment Content-Type Size
fdw-chk-5.patch text/x-patch 34.4 KB
fdw-inh-5.patch text/x-patch 97.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-12-10 03:26:28 Re: [COMMITTERS] pgsql: Keep track of transaction commit timestamps
Previous Message Petr Jelinek 2014-12-10 02:33:54 Re: Sequence Access Method WIP