RE: Re: User defined data types in Logical Replication

From: Huong Dangminh <huo-dangminh(at)ys(dot)jp(dot)nec(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Hiroshi Yanagisawa <hir-yanagisawa(at)ut(dot)jp(dot)nec(dot)com>
Subject: RE: Re: User defined data types in Logical Replication
Date: 2017-11-16 12:55:13
Message-ID: 75DB81BEEA95B445AE6D576A0A5C9E936A6C4B0A@BPXM05GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sawada-san,

Thanks for your response.
# And sorry again because I could not reply to your gmail
# address from my environment due to security restriction.

> >> We are getting the bellow error while trying use Logical Replication
> >> with user defined data types in a C program (when call elog function).
> >>
> >> ERROR: XX000: cache lookup failed for type XXXXX
> >>
> >
> > Sorry for continuously disturbing in this topic, but am I missing something
> here?
>
> No, but I'd suggest to provide a procedure for reproducing if possible,
> which will be helpful for investigation.

Sorry, I will be careful next time.

> > I mean that in case of type's OID in PUBLICATION host does not exists
> > in SUBSCRIPTION host's pg_type, it could returns unintended error (the
> XX000 above) when elog or ereport is executed.
> >
> > For more details, it happen in slot_store_error_callback when it try to
> call format_type_be(localtypoid) for errcontext.
> > slot_store_error_callback is set in slot_store_cstrings,
> slot_modify_cstrings function and it also be unset here, so the effect here
> is small but it happens.
> >
>
> I think I found out the cause of this issue, and this is a bug. This can
> be reproduced, for example, if the input function of the data type calls
> elog() during applying on the environment where OIDs of the data type on
> publisher and subscriber are different. The cause of this issue is that
> we call format_type_be() with remotetypoid. If the OIDs of data type on
> publisher and subscriber are different we search it from syscache by the
> OID that doesn't exist on subscriber.

Yes, I also think that.

> On detail of your patch, I don't think this direction is good. Since the
> subscriber already has a LogicalRepTyp cache entry for the type we can report
> the error message using the data type name. So I think this issue can be
> fixed by using the remote type name got from the cache.

Thanks,
I did not realize the LogicalRepRelMapEntry, remote type name is already here.

> Also I'm confused about the message of errcontext; currently we store the
> local data type OID corresponding to the remote data type name into the
> cache, and then we search the local data type name by the local data type
> OID stored in the cache. So it means the both the local data type OID and
> the remote data type OID always imply the same data type. We use the both
> data type OIDs for log message in slot_store_error_callback, but I think
> what the function want to do is to show the different type names if the
> table definitions on both server are different (e.g. sending jsonb column
> data to text column data). I think we should use the type of the local relation
> attribute rather than remote's one.
>
> Attached draft patch fixed this issue, at least on my environment.

It works good for me.

> Please review it.

I will review it soon.

---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-11-16 13:14:51 Re: pgsql: Disable installcheck tests for test_session_hooks
Previous Message 高增琦 2017-11-16 12:51:57 Re: no library dependency in Makefile?