Re: Referential Integrity Checks with Statement-level Triggers

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Referential Integrity Checks with Statement-level Triggers
Date: 2018-12-17 16:11:07
Message-ID: CAFj8pRCQe9=tX5GmC-S_Dmms9tdU4OWGD2xaHZFm-yaMS1zmoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

po 17. 12. 2018 v 15:32 odesílatel Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
napsal:

> Back when Pg added statement-level triggers, I was interested in the
> potential promise of moving referential integrity checks to statement-level
> triggers.
>
> The initial conversation, along with Kevin Grittner's POC script (in SQL)
> that showed a potential for a 98% reduction in time spent doing RI checks.
> The original thread is here:
>
>
> https://www.postgresql.org/message-id/CACjxUsM4s9%3DCUmPU4YFOYiD5f%3D2ULVDBjuFSo20Twe7KbUe8Mw%40mail.gmail.com
>
> I dug around in the code, and was rather surprised at how close we already
> are to implementing this. The function RI_Initial_Check() already does a
> left-join query via SPI to look for any invalid data, so if we could just
> replace the near table with the transition table for inserted rows, we'd be
> home free. The function SPI_register_trigger_data() makes the transition
> tables visible to SPI, so I started to wonder why this hadn't be done
> already.
>
> I approached Kevin and Thomas Munro seeking feedback on my approach. I
> also made it into a session at the PgConf.ASIA un-conference, and then
> later with Michael Paquier at that same conference, and the coalesced
> feedback was this:
>
> - the overhead of registering the transition tables probably makes it
> unprofitable for single row inserts
> - the single row overhead is itself significant, so maybe the transition
> tables aren't worse
> - there has been talk of replacing transition tables with an in-memory
> data structure that would be closer to "free" from a startup perspective
> and might even coalesce the transition tables of multiple statements in the
> same transaction
> - because no declarative code changes, it's trivial to switch from row
> level to statement level triggering via pg_upgrade
> - assuming that transition tables are an overhead that only pays off when
> > N rows have been updated, does it make sense to enforce RI with something
> that isn't actually a trigger?
> - there was also some mention that parallel query uses a queue mechanism
> that might be leveraged to do row-level triggers for updates of <= N rows
> and statement level for > N
>
> That's what I have so far. I'm going to be working on a POC patch so that
> I can benchmark a pure-statement-level solution, which if nothing else will
> let us know the approximate value of N.
>
> All suggestions are appreciated.
>

It is great. I though little bit about it - just theoretically.

ROW trigger call RI check too often, and statement trigger too less. I
think so ideal design can be call RI check every 10K rows. I think so can
be unfriendly if somebody does very long import and it fails on the end. I
don't think so there should not be any performance difference, if RI check
is called per 1000 or more rows.

Regards

Pavel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2018-12-17 16:23:44 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Antonin Houska 2018-12-17 15:58:03 Re: [HACKERS] WIP: Aggregation push-down