RE: [BUG]Update Toast data failure in logical replication

From: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [BUG]Update Toast data failure in logical replication
Date: 2022-02-09 05:38:08
Message-ID: TYCPR01MB6128A3E79D5A703DC3B73E62FB2E9@TYCPR01MB6128.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 8, 2022 3:18 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
> > Right, and it is getting changed. We are just printing the first 200
> > characters (by using SQL [1]) from the decoded tuple so what is shown
> > in the results is the initial 200 bytes.
>
> Ah, I knew I must have been missing something.
>
>
> > The complete decoded data after the patch is as follows:
>
> Hm. I think we should change the way the strings are shortened - otherwise we
> don't really verify much in that test. Perhaps we could just replace the long
> repetitive strings with something shorter in the output?
>
> E.g. using something like regexp_replace(data,
> '(1234567890|9876543210){200}', '\1{200}','g')
> inside the substr().
>
> Wonder if we should deduplicate the number of different toasted strings in the
> file to something that'd allow us to have a single "redact_toast" function or
> such. There's too many different ones to have a reasonbly simple redaction
> function right now. But that's perhaps better done separately.
>

I tried to make the output shorter using your suggestion like the following SQL,
please see the attached patch, which is based on v8 patch[1].

SELECT substr(regexp_replace(data, '(1234567890|9876543210){200}', '\1{200}','g'), 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');

Note that some strings are still longer than 200 characters even though they have
been shorter, so they can't be shown entirely.

e.g.
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[te

The entire string is:
table public.toasted_key: UPDATE: old-key: toasted_key[text]:'1234567890{200}' new-tuple: id[integer]:1 toasted_key[text]:unchanged-toast-datum toasted_col1[text]:unchanged-toast-datum toasted_col2[text]:'9876543210{200}'

Maybe it's better to change the substr length to 250 to show the entire string, or we
can do it as separate HEAD only improvement where we can deduplicate some of the
other long strings as well. Thoughts?

[1] https://www.postgresql.org/message-id/CAA4eK1L_Z_2LDwMNbGrwoO%2BFc-2Q04YORQSA9UfGUTMQpy2O1Q%40mail.gmail.com

Regards,
Tang

Attachment Content-Type Size
improve_toast_test.diff application/octet-stream 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-02-09 05:41:41 Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR
Previous Message Michael Paquier 2022-02-09 05:17:59 Re: make MaxBackends available in _PG_init