Re: Huge memory consumption on partitioned table with FKs

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, keisuke kuroda <keisuke(dot)kuroda(dot)3862(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, tatsuhito(dot)kasahara(dot)rd(at)hco(dot)ntt(dot)co(dot)jp
Subject: Re: Huge memory consumption on partitioned table with FKs
Date: 2021-03-08 05:27:39
Message-ID: CA+HiwqFbYQ+6xKyDJmb18+vHA1+KuF8VMXEtVO-Y68tTJow5CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 5, 2021 at 6:00 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > Updated patch attached.
>
> This claim seems false on its face:
>
> > All child constraints of a given foreign key constraint can use the
> > same RI query and the resulting plan, that is, no need to create as
> > many copies of the query and the plan as there are partitions, as
> > happens now due to the child constraint OID being used in the key
> > for ri_query_cache.
>
> What if the child tables don't have the same physical column numbers
> as the parent? The comment claiming that it's okay if riinfo->fk_attnums
> doesn't match seems quite off point, because the query plan is still
> going to need to use the correct column numbers. Even if column numbers
> are the same, the plan would also contain table and index OIDs that could
> only be right for one partition.
>
> I could imagine translating a parent plan to apply to a child instead of
> building it from scratch, but that would take a lot of code we don't have
> (there's no rewriteHandler infrastructure for plan nodes).
>
> Maybe I'm missing something, because I see that the cfbot claims
> this is passing, and I'd sure like to think that our test coverage
> is not so thin that it'd fail to detect probing the wrong partition
> for foreign key matches. But that's what it looks like this patch
> will do.

The quoted comment could have been written to be clearer about this,
but it's not talking about the table that is to be queried, but the
table whose RI trigger is being executed. In all the cases except one
(mentioned below), the table that is queried is the same irrespective
of which partition's trigger is being executed, so we're basically
creating the same plan separately for each partition.

create table foo (a int primary key) partition by range (a);
create table foo1 partition of foo for values from (minvalue) to (1);
create table foo2 partition of foo for values from (1) to (maxvalue);
create table bar (a int references foo) partition by range (a);
create table bar1 partition of bar for values from (minvalue) to (1);
create table bar2 partition of bar for values from (1) to (maxvalue);

insert into foo values (0), (1);
insert into bar values (0), (1);

For the 2nd insert statement, RI_FKey_check() issues the following
query irrespective of whether it is called from bar1's or bar2's
insert check RI trigger:

SELECT 1 FROM foo WHERE foo.a = $1;

Likewise for:

delete from foo;

ri_restrict() issues the following query irrespective of whether
called from foo1's or foo2's delete action trigger:

SELECT 1 FROM bar WHERE bar.a = $1;

The one case I found in which the queried table is the same as the
table whose trigger has been invoked is ri_Check_Pk_Match(), in which
case, it's actually wrong for partitions to share the plan, so the
patch was wrong in that case. Apparently, we didn't test that case
with partitioning, so I added one as follows:

+-- test that ri_Check_Pk_Match() scans the correct partition for a deferred
+-- ON DELETE/UPDATE NO ACTION constraint
+CREATE SCHEMA fkpart10
+ CREATE TABLE tbl1(f1 int PRIMARY KEY) PARTITION BY RANGE(f1)
+ CREATE TABLE tbl1_p1 PARTITION OF tbl1 FOR VALUES FROM (minvalue) TO (1)
+ CREATE TABLE tbl1_p2 PARTITION OF tbl1 FOR VALUES FROM (1) TO (maxvalue)
+ CREATE TABLE tbl2(f1 int REFERENCES tbl1 DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fkpart10.tbl1 VALUES (0), (1);
+INSERT INTO fkpart10.tbl2 VALUES (0), (1);
+BEGIN;
+DELETE FROM fkpart10.tbl1 WHERE f1 = 0;
+UPDATE fkpart10.tbl1 SET f1 = 2 WHERE f1 = 1;
+INSERT INTO fkpart10.tbl1 VALUES (0), (1);
+COMMIT;
+DROP SCHEMA fkpart10 CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to table fkpart10.tbl1
+drop cascades to table fkpart10.tbl2

With the v3 patch, the test case fails with the following error on COMMIT:

+ERROR: update or delete on table "tbl1_p2" violates foreign key
constraint "tbl2_f1_fkey2" on table "tbl2"
+DETAIL: Key (f1)=(1) is still referenced from table "tbl2".

That's because in ri_Check_Pk_Match(), partition tbl2 ends up using
the same cached plan as tbl1 and so scans tbl1 to check whether the
row deleted from tbl2 has reappeared, which is of course wrong. I
have fixed that by continuing to use the individual partition's
constraint's OID as the query key for this particular query.

I have also removed the somewhat confusing comment about fk_attnums.
The point it was trying to make is that fk_attnums being possibly
different among partitions does not make a difference to what any
given shareable query's text ultimately looks like. Those attribute
numbers are only used by the macros RIAttName(), RIAttType(),
RIAttCollation() when generating the query text and they'd all return
the same values no matter which partition's attribute numbers are
used.

Updated patch attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v4-0001-Share-SPI-plan-between-partitions-in-some-RI-trig.patch application/x-patch 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2021-03-08 05:30:40 Re: Asynchronous Append on postgres_fdw nodes.
Previous Message Amit Kapila 2021-03-08 05:19:45 Re: [HACKERS] logical decoding of two-phase transactions