How to customize postgres for sharing read-only tables in multiple data-dirs between servers

From: "Guttman, Maoz" <maoz(dot)guttman(at)intel(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Guttman, Maoz" <maoz(dot)guttman(at)intel(dot)com>
Subject: How to customize postgres for sharing read-only tables in multiple data-dirs between servers
Date: 2021-02-16 13:49:02
Message-ID: SJ0PR11MB5199A76F19734527A57AB05488879@SJ0PR11MB5199.namprd11.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Problem statement:
I have to develop a solution in which a single source populates a table. Once the table is populated, it is considered as read-only and then we run many read-only queries on it.
Such read-only tables are generated by multiple simulation runs: each simulation populates an independent table, meaning there is no cross-write to the tables.
However, the read-only queries can be executed on a single or multiple tables.
In my environment I have plenty of machines to run the simulations and I can’t use these machines to have a postgres compute farm as a cloud solution. So I can’t use my machines to run endless postgres server jobs as the solution is intended for.

My idea is:
Stage 1: Ad-hoc server+client for populating a table: start a server+client on a local machine, populate the table and stop the server+client. The data-dir is hosted in a central file system (e.g. NFS).
Stage 2: Ad-hoc server+client for querying the now read-only table(s) from step 1: start a server+client on a local machine, run read-only queries and stop the server+client.
In order to implement stage 2 I will:
1. Create a new ad-hoc empty data-dir
2. Create a soft-link from each data-dirtable files (including its index files) that is needed for the query to the ad-hoc data-dir.
Note that files in multiple data-dirs can be linked to the ad-hoc data-dir.
3. Update postgress catalog tables in the ad-hoc data-dir according to above soft-links
4. To guarantee that there will be no modifications of read-only table files, I will implement a table-am (access methods) which registers ONLY the table-am callback functions that are relevant for running read-only queries.
Since it is possible to run multiple queries on each table, there can be multiple instances of client-server describes in stage 2 running simultaneously.

Any thoughts?
Can it work?
My concern is for the process described in stage2#4: can I truly rely on callback functions running read-only queries do not update behind the scene the read-only table files?
Any other suggestion to develop & maintain a sustainable solution?

Thanks,
Maoz
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2021-02-16 14:01:59 Re: A reloption for partitioned tables - parallel_workers
Previous Message Amit Langote 2021-02-16 13:42:57 Re: making update/delete of inheritance trees scale better