Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-10 09:54:48
Message-ID: CACawEhUu6S8E4Oo7+s5iaq=yLRZJb6uOZeEQSGJj-7NVkDzSaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit, all

>
> Some of the ideas I can think of are as follows:
>
> 1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS"
> and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after
> verifying updates and deletes of the first test, we can drop some of
> the columns on both publisher and subscriber, then use alter
> subscription ... refresh publication command and then do the steps of
> the second test. Note that we don't add tables after initial setup,
> only changing schema.
>

Done with an important caveat. I think this reorganization of the test
helped
us to find one edge case regarding dropped columns.

I realized that the dropped columns also get into the tuples_equal()
function. And,
the remote sends NULL to for the dropped columns(i.e., remoteslot), but
index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
columns on the outslot. So, the dropped columns are not NULL in the outslot.

This triggers tuples_equal to fail. To fix that, I improved the tuples_equal
such that it skips the dropped columns.

I also spend quite a bit of time understanding how/why this impacts
HEAD. See steps below on HEAD, where REPLICA IDENTITY FULL
fails to replica the data properly:

-- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM
generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;

-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)

*-- target, OOPS wrong result!!!!*SELECT count(*) FROM test WHERE x = 1;

┌───────┐
│ count │
├───────┤
│ 1 │
└───────┘
(1 row)

Should we have a separate patch for the tuples_equal change so that we
might want to backport? Attached as v40_0001 on the patch.

Note that I need to have that commit as 0001 so that 0002 patch
passes the tests.

> 2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE
> COLUMN ON THE SUBS INDEX" as both use the same schema. After the first
> test, we need to drop the existing index and create a new index on the
> subscriber node.
>

done

>
> 3. General comment
> +# updates 200 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");
>
> I think here you are updating 20 rows not 200. So, the comment seems
> wrong to me.
>

I think I have fixed that in an earlier version because I cannot see this
comment anymore.

>
> Please think more and see if we can combine some other tests like
> "Unique index that is not primary key or replica identity" and the
> test we will have after comment#2 above.
>
>
I'll look for more opportunities and reply to the thread. I wanted to send
this mail so that you can have a look at (1) earlier.

Thanks,
Onder

Attachment Content-Type Size
v40_0001_Skip-dropped_columns_for_tuples_equal.patch application/octet-stream 1.2 KB
v40_0002_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 55.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-03-10 09:59:12 Re: pgsql: Use ICU by default at initdb time.
Previous Message Peter Eisentraut 2023-03-10 09:54:14 Re: pgsql: Allow tailoring of ICU locales with custom rules