Re: Schema design: user account deletion vs. keeping family tree data

From: pg254kl(at)georgiou(dot)vip
To: Christoph Pieper <christoph(at)fecra(dot)de>,"pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Schema design: user account deletion vs. keeping family tree data
Date: 2025-11-24 21:51:35
Message-ID: 176402110004.8.5805411983290632546.1025420735@georgiou.vip
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Option B would be fine with me, unless there is good reason to normalize
it further.  A query using recursive CTE would be able to find ancestors
and descendants neatly and efficiently.

I deal with some tables in the billions of rows, and with that hat on, I
would use int/bigint identity for the PKs instead of UUIDs (less
storage, smaller indices, faster joins).  I would have a boolean
'active' column to handle soft deletes, along with created_at and
disabled_at timestamptz columns maintained by triggers.  I would use
composite partitioning, first level partition by list on 'active', and
second level partition by range on the id PK with the range being a few
million.  If for some reason you have to use UUIDs, use time-based
UUIDv7 (native on PostgreSQL v18) so you can range partition.

--
regards,
Kiriakos Georgiou

On 11/24/25 6:27 AM, Christoph Pieper - christoph at fecra.de wrote:
> Hi,
>
> I’m designing a schema for a family‑tree web app on PostgreSQL. Users
> register accounts and can create one or more family trees. Each tree
> consists of persons (the user themself, relatives, ancestors). Many
> persons in a tree will never have an account (e.g.
> great‑grandparents). Because of GDPR, when a user deletes their
> account we must remove/anonymise their user profile, but we want to
> keep the family tree data intact so that other users can still
> reference those ancestors.
>
> We expect hundreds of thousands to millions of persons and deep
> ancestry queries (N generations, inbreeding/relationship calculations).
> I’m hesitating between two schema designs:
>
> *Option A – Separate family_tree_node table*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date
>   -- more non-account-specific attributes may be added her in future!
> );
>
> create table family_tree (
>   id            uuid primary key,
>   owner_user_id uuid not null references app_user(id) on delete cascade,
>   created_at    timestamptz not null default now()
> );
>
> create table family_tree_node (
>   id              uuid primary key,
>   family_tree_id  uuid not null references family_tree(id) on delete
> cascade,
>   person_id       uuid references person(id) on delete set null,
>   father_node_id  uuid references family_tree_node(id),
>   mother_node_id  uuid references family_tree_node(id)
> );
>
> create index on family_tree_node (family_tree_id);
> create index on family_tree_node (person_id);
> create index on family_tree_node (father_node_id);
> create index on family_tree_node (mother_node_id);
>
> Here family_tree_node is the structural graph for a specific tree. A
> node may point to a person, but can also exist without one (minimal
> data only). If a user/account is deleted, we only drop/anonymise data
> in app_user (and optionally created_by_user_id), while person and
> family_tree_node remain.
>
> *Option B – Use person directly as the graph node (soft delete)*
>
> create table app_user (
>   id          uuid primary key,
>   email       text unique not null,
>   created_at  timestamptz not null default now()
> );
>
> create table person (
>   id                 uuid primary key,
>   created_by_user_id uuid references app_user(id) on delete set null,
>   first_name         text,
>   last_name          text,
>   birth_date         date,
>   father_id          uuid references person(id),
>   mother_id          uuid references person(id),
>   deleted_at         timestamptz     -- soft delete flag
> );
>
> create index on person (father_id);
> create index on person (mother_id);
> create index on person (deleted_at);
>
> In this model, the pedigree graph is just a person(father_id,
> mother_id). When a user deletes their account we never hard‑delete
> persons; instead we set deleted_at and/or anonymise some fields. All
> queries must filter on deleted_at is null to hide soft‑deleted persons.
>
> Question:
> From a PostgreSQL point of view (database best practices, data
> integrity, performance and long‑term maintainability at millions of
> rows), which approach would you prefer, or is there a better pattern
> for this kind of “account can be deleted, but genealogy should remain”
> use case?
>
> Regards and many thanks!
> Christoph
>
>
>
> --
> fecra company logo
>
> *Christoph Pieper*
>
> christoph(at)fecra(dot)de <mailto:christoph(at)fecra(dot)de>
>
> fecra GmbH, Strelitzer Str. 63 10115 Berlin, Deutschland
>
> www.fecra.de <https://www.fecra.de/>  | HRB 268518 B
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message pg254kl 2025-11-24 22:30:14 Re: set role command
Previous Message Ron Johnson 2025-11-24 20:25:53 Re: set role command