Re: Contributing some code

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Antonio Belloni <antonio(dot)belloni(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Contributing some code
Date: 2017-12-28 02:51:02
Message-ID: CAMsr+YHw1738jaAk+0KWVF-NX1h=40ZA+xsSdKb2Druy=voOsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 December 2017 at 01:40, Antonio Belloni <antonio(dot)belloni(at)gmail(dot)com>
wrote:

> Hi,
>
> This is my first post on the list. My name is Antonio. I am a CS grad
> student and my field of study is about databases and information retrieval.
> To get some practical knowledge, I've been studying Postgresql codebase for
> a while.
>
> Now I would like to contribute with some code and I've chosen the
> following topic of the TODO list :
>
> Allow reporting of which objects are in which tablespaces
>
> This item is difficult because a tablespace can contain objects from
> multiple databases. There is a server-side function that returns the
> databases which use a specific tablespace, so this requires a tool that
> will call that function and connect to each database to find the objects in
> each database for that tablespace.
> The topic suggests to use the pg_tablespace_databases to discover which
> database is using a specific tablespace and then connect to each database
> and find the objects in the tablespaces.
> I checked the code of pg_tablespace_databases, defined in
> src/backend/utils/adt/misc.c, and see that it uses a much simpler approach
> : It just reads the tablespaces directories and return the name of the
> directories that represents databases OIDs.
> Although the function works as expected, I can see some issues not
> addressed in the code :
> - It does not check for permissions. Any user can execute it;- It does
> not check if the platform supports symlinks, which can cause an error
> because the function is trying to follow the links defined in
> base/pg_tblspc.
> I could use the same approach and write a function that goes down one more
> level in the directory structure and find the objects' OIDs inside each
> database directory, but I don't know if this is the better way to do that.
>
>

There's a bit of a trap hidden here. The names of relation extents look
like oids, possibly with an extent number for relations bigger than 1GB.
But they aren't. They're relfilenode numbers.

PostgreSQL maps relation oids to relfilenodes. By default on a new system,
relations will often have the same relfilenode as oid. That's a pity IMO;
it'd be way less confusing if we allocated relfilenodes from a wholly
different counter, because as it is, it gives people the false impression
they can expect the filename relfilenode to be the relation oid.

In fact, what happens (per my probably imperfect understanding) is that
PostgreSQL checks pg_class (via the relcache) for the oid of the table. It
then uses RelationIsMapped to see if it's a normal relation with the
filenode number in pg_class or not. If it's a normal (non-mapped) relation,
it uses the Relation's rd_node to find the relation's physical address
tablespace, dboid, and relfilenode. If it's a mapped relation, it instead
consults the relmapper to find the relation's storage; see
src/backend/utils/cache/relmapper.c .

See also src/backend/storage/smgr/README,

This means you can't determine relation oids from ondisk state without
scanning pg_class. And pg_class is per-database, not a shared relation, so
you must look at each db in turn, since Pg doesn't support cross-DB
queries. Logical decoding handles this with the RelidByRelfilenode
function, but there are issues there around making sure you have the right
snapshot etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-28 04:15:00 Re: pgsql: Add parallel-aware hash joins.
Previous Message Tsunakawa, Takayuki 2017-12-28 02:40:37 RE: [HACKERS] Transactions involving multiple postgres foreign servers