Re: Oddity in handling of cached plans for FDW queries

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Oddity in handling of cached plans for FDW queries
Date: 2016-07-13 09:00:26
Message-ID: CAFjFpRf94+DJOr8pOLUQYgmmOxjonGPhrU4uvo_00BKCZ-fGug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Seems odd to me. Both relations use the same user mapping as before, so
> the join should be pushed down.
>
> Another thing I noticed is that the code in plancache.c would invalidate a
> cached plan with a foreign join due to user mapping changes even in the
> case where user mappings are meaningless to the FDW.
>
> To fix the first, I'd like to propose (1) replacing the existing
> has_foreign_join flag in the CachedPlan data structure with a new flag, say
> uses_user_mapping, that indicates whether a cached plan uses any user
> mapping regardless of whether the cached plan has foreign joins or not
> (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2)
> invalidating the cached plan if the uses_user_mapping flag is true. I
> thought that we invalidate the cached plan if the flag is true and there is
> a possibility of performing foreign joins, but it seems to me that updates
> on the corresponding catalog are infrequent enough that such a detailed
> tracking is not worth the effort.

That way we will have plan cache invalidations even when simple foreign
tables scans (not join) are involved, which means all the plans involving
any reference to a foreign table with valid user mapping associated with
it. That can be a huge cost as compared to the current solution where
sub-optimal plan will be used only when a user mapping is changed while a
statement has been prepared. That's a rare scenario and somebody can work
around that by preparing the statement again. IIRC, we had discussed this
situation when implementing the cache invalidation logic. But there's no
workaround for your solution.

> One benefit of using the proposed approach is that we could make the FDW's
> handling of user mappings in BeginForeignScan more efficient; currently,
> there is additional overhead caused by catalog re-lookups to obtain the
> user mapping information for the simple-foreign-table-scan case where user
> mappings mean something to the FDW as in postgres_fdw (probably, updates on
> the catalogs are infrequent, though), but we could improve the efficiency
> by using the validated user mapping information created at planning time
> for that case as well as for the foreign-join case. For that, I'd like to
> propose storing the validated user mapping information into ForeignScan,
> not fdw_private.

postgres_fdw to fetches user mapping in some cases but never remembers it.
If what you are describing is a better way, it should have been implemented
before join pushdown was implemented. Refetching a user mapping is not that
expensive given that there is a high chance that it will be found in the
syscache, because it was accessed at the time of planning. Effect of plan
cache invalidation is probably worse than fetching the value from a sys
cache again.

> There is a discussion about using an ExtensibleNode [1] for that, but the
> proposed approach would make the FDW's job much simpler.
>
> I don't think the above change is sufficient to fix the second. The root
> reason for that is that since currently, we allow the user mapping OID
> (rel->umid) to be InvalidOid in two cases: (1) user mappings mean something
> to the FDW but it can't get any user mapping at planning time and (2) user
> mappings are meaningless to the FDW, we cannot distinguish these two cases.

The way to differentiate between these two is to look at the serverid. If
server id is invalid it's the case 1, otherwise 2. For a simple table,
there will always be a serverid associated with it. A user mapping will
always be associated with a simple table if there is one i.e. FDW requires
it. Albeit, there will be a case when FDW requires a user mapping but it's
not created, in which case the table will be useless for fetching remote
data anyway. I don't think we should be programming for that case.

> So, I'd like to introduce a new callback routine to specify that user
> mappings mean something to the FDW as proposed by Tom [2], and use that to
> reject the former case, which allows us to set the above uses_user_mapping
> flag appropriately, ie, set the flag to true only if user mapping changes
> require forcing a replan. This would change the postgres_fdw's behavior
> that it allows to prepare statements involving foreign tables without any
> user mappings as long as those aren't required at planning time, but I'm
> not sure that it's a good idea to keep that behavior because ISTM that such
> a behavior would make people sloppy about creating user mappings, which
> could lead to latent security problems [2].
>
> Attached is a proposed patch for that.
>
>
This routine is meaningless unless the core (or FDW) does not allow a user
mapping to be created for such FDWs. Without that, core code would get
confused as to what it should do when it sees a user mapping for an FDW
which says user mappings are meaningless. If we do that, we might not set
hasForeignJoin flag in create_foreignscan_plan() when the user mapping for
pushed down join is invalid. That will keep FDWs which do not use user
mappings out of plan cache invalidation.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-07-13 09:22:36 Constraint merge and not valid status
Previous Message Fabien COELHO 2016-07-13 08:39:34 Re: pgbench - compute & show latency consistently