Quick Links

Re: Schema design: user account deletion vs. keeping family tree data

From:	pg254kl(at)georgiou(dot)vip
To:	Christoph Pieper <christoph(at)fecra(dot)de>,"pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: Schema design: user account deletion vs. keeping family tree data
Date:	2025-11-24 21:51:35
Message-ID:	176402110004.8.5805411983290632546.1025420735@georgiou.vip
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Option B would be fine with me, unless there is good reason to normalize
it further. A query using recursive CTE would be able to find ancestors
and descendants neatly and efficiently.

I deal with some tables in the billions of rows, and with that hat on, I
would use int/bigint identity for the PKs instead of UUIDs (less
storage, smaller indices, faster joins). I would have a boolean
'active' column to handle soft deletes, along with created_at and
disabled_at timestamptz columns maintained by triggers. I would use
composite partitioning, first level partition by list on 'active', and
second level partition by range on the id PK with the range being a few
million. If for some reason you have to use UUIDs, use time-based
UUIDv7 (native on PostgreSQL v18) so you can range partition.

--
regards,
Kiriakos Georgiou

On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote:
> Hi,
>
> I’m designing a schema for a family‑tree web app on PostgreSQL. Users
> register accounts and can create one or more family trees. Each tree
> consists of persons (the user themself, relatives, ancestors). Many
> persons in a tree will never have an account (e.g.
> great‑grandparents). Because of GDPR, when a user deletes their
> account we must remove/anonymise their user profile, but we want to
> keep the family tree data intact so that other users can still
> reference those ancestors.
>
> We expect hundreds of thousands to millions of persons and deep
> ancestry queries (N generations, inbreeding/relationship calculations).
> I’m hesitating between two schema designs:
>
> *Option A – Separate family_tree_node table*
>
> create table app_user (
> id uuid primary key,
> email text unique not null,
> created_at timestamptz not null default now()
> );
>
> create table person (
> id uuid primary key,
> created_by_user_id uuid references app_user(id) on delete set null,
> first_name text,
> last_name text,
> birth_date date
> -- more non-account-specific attributes may be added her in future!
> );
>
> create table family_tree (
> id uuid primary key,
> owner_user_id uuid not null references app_user(id) on delete cascade,
> created_at timestamptz not null default now()
> );
>
> create table family_tree_node (
> id uuid primary key,
> family_tree_id uuid not null references family_tree(id) on delete
> cascade,
> person_id uuid references person(id) on delete set null,
> father_node_id uuid references family_tree_node(id),
> mother_node_id uuid references family_tree_node(id)
> );
>
> create index on family_tree_node (family_tree_id);
> create index on family_tree_node (person_id);
> create index on family_tree_node (father_node_id);
> create index on family_tree_node (mother_node_id);
>
> Here family_tree_node is the structural graph for a specific tree. A
> node may point to a person, but can also exist without one (minimal
> data only). If a user/account is deleted, we only drop/anonymise data
> in app_user (and optionally created_by_user_id), while person and
> family_tree_node remain.
>
> *Option B – Use person directly as the graph node (soft delete)*
>
> create table app_user (
> id uuid primary key,
> email text unique not null,
> created_at timestamptz not null default now()
> );
>
> create table person (
> id uuid primary key,
> created_by_user_id uuid references app_user(id) on delete set null,
> first_name text,
> last_name text,
> birth_date date,
> father_id uuid references person(id),
> mother_id uuid references person(id),
> deleted_at timestamptz -- soft delete flag
> );
>
> create index on person (father_id);
> create index on person (mother_id);
> create index on person (deleted_at);
>
> In this model, the pedigree graph is just a person(father_id,
> mother_id). When a user deletes their account we never hard‑delete
> persons; instead we set deleted_at and/or anonymise some fields. All
> queries must filter on deleted_at is null to hide soft‑deleted persons.
>
> Question:
> From a PostgreSQL point of view (database best practices, data
> integrity, performance and long‑term maintainability at millions of
> rows), which approach would you prefer, or is there a better pattern
> for this kind of “account can be deleted, but genealogy should remain”
> use case?
>
> Regards and many thanks!
> Christoph
>
>
>
> --
> fecra company logo
>
> *Christoph Pieper*
>
> christoph(at)fecra(dot)de <mailto:christoph(at)fecra(dot)de>
>
> fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland
>
> www.fecra.de <https://www.fecra.de/> | HRB 268518 B
>

In response to

Schema design: user account deletion vs. keeping family tree data at 2025-11-24 11:27:56 from Christoph Pieper

Browse pgsql-general by date

	From	Date	Subject
Next Message	pg254kl	2025-11-24 22:30:14	Re: set role command
Previous Message	Ron Johnson	2025-11-24 20:25:53	Re: set role command