Re: pgsql: Doc: Explain about Column List feature.

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Amit Kapila <akapila(at)postgresql(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: pgsql: Doc: Explain about Column List feature.
Date: 2022-09-13 12:11:38
Message-ID: 20220913121138.yn7ekkfysxzhkm2u@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 2022-Sep-07, Amit Kapila wrote:

> Doc: Explain about Column List feature.
>
> Add a new logical replication section for "Column Lists" (analogous to the
> Row Filters page). This explains how the feature can be used and the
> caveats in it.
>
> Author: Peter Smith
> Reviewed-by: Shi yu, Vignesh C, Erik Rijkers, Amit Kapila
> Backpatch-through: 15, where it was introduced
> Discussion: https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=VUqh16UWtFO3GzcKQK_5m1hrW3vqg@mail.gmail.com

Hi

I just read these docs and noticed that it mentions that column lists
can be used for security. As far as I remember, this is wrong: it is
the subscriber that builds the query to read column data during initial
sync, and the publisher doesn't forbid to read columns not in it, so it
is entirely possible for a malicious subscriber to read columns other
than those published. I'm pretty sure we discussed this at some point
during development of the feature.

So I suggest to mention this point explicitly in its own paragraph, to
avoid giving a false sense of security.

While going over this text I found some additional things that could
--in my opinion-- stand some improvement:

* It feels better to start the section saying that a list can be
specified; subscriber must have all those columns; omitting list
means to publish everything. That leads to shorter text (no need to
say "you need to have them all, oh wait you might only have a few").

* there's no reason to explain the syntax in vague terms and refer the
reader to the reference page.

* The first few <sect2> seem to give no useful structure, and instead
cause the text to become disorganized. I propose to remove them, and
instead mix the paragraphs in them so that we explain the rules and
the behavior, and lastly the effect on specific commands.

The attached patch effects those changes.

One more thing. There's a sect2 about combining column list. Part of it
seems pretty judgmental and I see no reason to have it in there; I
propose to remove it (it's not in this patch). I think we should just
say it doesn't work at present, here's how to work around it, and
perhaps even say that we may lift the restriction in the future. The
paragraph that starts with "Background:" is IMO out of place, and it
repeats the mistake that column lists are for security.

Lastly: In the create-publication reference page I think it would be
better to reword the Parameters section a bit. It mentions
FOR TABLE as a parameter, but the parameter is actually
<replaceable>table_name</replaceable>; and the row-filter and
column-list explanations are also put in there when they should be in
their own <replaceable>expression</> and <replaceable>column_name</>
varlistentries. I think splitting things that way would result in a
clearer explanation.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Attachment Content-Type Size
column-list-wording.patch text/x-diff 5.6 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2022-09-13 12:39:27 pgsql: Make locale option behavior more consistent
Previous Message Alvaro Herrera 2022-09-13 10:07:40 Re: pgsql: Prefetch data referenced by the WAL, take II.

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2022-09-13 12:40:47 Re: Tuples inserted and deleted by the same transaction
Previous Message kuroda.hayato@fujitsu.com 2022-09-13 12:05:32 RE: Perform streaming logical transactions by background workers and parallel apply