GSoC 2017: Foreign Key Arrays

From: Mark Rofail <markm(dot)rofail(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: GSoC 2017: Foreign Key Arrays
Date: 2017-05-22 23:51:33
Message-ID: CAJvoCut7zELHnBSC8HrM6p-R6q-NiBN1STKhqnK5fPE-9=Gq3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear PostgreSQL hacker community,

I am working on Foreign Key Arrays as part of the Google Summer of Code
2017.

I will be logging my progress on this thread as I progress, twice a week
(Mondays and Fridays), so anyone who is willing to comment, please do.

*The Problem*
Foreign Key Arrays were introduced by Marco Nenciarini[1], however, the
proposed patch had some performance issues. Let's assume we have two
tables, table B has a foreign key array that references table A, any change
in table A (INSERT, UPDATE, DELETE) would trigger a referential
integrity check on table B. The current implementation uses sequential
scans to accomplish this. This limits the size of tables using Foreign Key
Arrays to ~100 records which is not practical in real life applications.

*The Proposed Solution*
Ultimately, as proposed by Tom Lane[2], we would like to replace the
sequential scan with a GIN-indexed scan which would greatly enhance the
performance.

To achieve this, introducing a number of new operators is required.
However, for the scope of the project, we will focus on the most basic case
where the Primary Keys are of pseudo-type anyelement and the Foreign Keys
are of pseudo-type anyarray, thus the main operator of concern will be
@>(anyarray,anyelement).

*Progress So Far*
The actual coding begins on 30th of May, till then I will use my time to
research, to settle the technical details of my plan.

- Collected resources about GIN indexing
- http://www.sigaev.ru/gin/README.txt
- https://wiki.postgresql.org/wiki/GIN_generalization
- src\backend\access\gin\README in the repo

- Cloned the git repo found @ https://github.com/postgres/postgres and
identified the main two files I will be concerned with. (I know I may need
to edit other files but these seem to where I will spend most of my summer)
- src/backend/commands/tablecmds.c
- src/backend/utils/ri_triggers.c

*I am yet to identify the files concerned with the GIN opclass. <-- if
anyone can help with this*

- read a little about op classes
- https://www.postgresql.org/docs/9.5/static/indexes-opclass.html
- explored the existing op classes in Postgres

*Next Step*
I still don't have a solid grasp of how I am going to approach creating an
operator, so I would like to experiment till the next report on creating a
very simple operator.

*I have attached the original proposal here.*

[1]
https://www.postgresql.org/message-id/1343842863.5162.4.camel%40greygoo.devise-it.lan
[2] https://www.postgresql.org/message-id/28389.1351094795@sss.pgh.pa.us

Best Regards,
Mark Rofail

Attachment Content-Type Size
GSoCproposal2017.pdf application/pdf 155.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2017-05-22 23:54:57 Re: PG10 Crash-safe and replicable Hash Indexes and UNIQUE
Previous Message Neha Khatri 2017-05-22 23:08:12 Improve logical decoding error message (was wal_level > WAL_LEVEL_LOGICAL)