Re: Alias collision in `refresh materialized view concurrently`

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, Mathis Rudolf <mathis(dot)rudolf(at)credativ(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Alias collision in `refresh materialized view concurrently`
Date: 2021-06-01 07:43:44
Message-ID: CALj2ACXnoc8zaJv88GOsacO4cbcgY7x9W2sVhHRuWaVAvLPazw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 1, 2021 at 7:11 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, May 21, 2021 at 03:56:31PM +0530, Bharath Rupireddy wrote:
> > On Fri, May 21, 2021 at 6:08 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> I am not sure that I see the point of using a random() number here
> >> while the backend ID, or just the PID, would easily provide enough
> >> entropy for this internal alias. I agree that "mv" is a bad choice
> >> for this alias name. One thing that comes in mind here is to use an
> >> alias similar to what we do for dropped attributes, say
> >> ........pg.matview.%d........ where %d is the PID. This will very
> >> unlikely cause conflicts.
> >
> > I agree that backend ID and/or PID is enough. I'm not fully convinced
> > with using random(). To make it more concrete, how about something
> > like pg.matview.%d.%d (MyBackendId, MyProcPid)? If the user still sees
> > some collisions, then IMHO, it's better to ensure that this kind of
> > table/alias names are not generated outside of the server.
>
> There is no need to have the PID if MyBackendId is enough, so after
> considering it I would just choose something like what I quoted above.
> Don't we need also to worry about the queries using newdata, newdata2
> and diff as aliases? Would you like to implement a patch doing
> something like that?

Sure. PSA v2 patch. We can't have "." as separator in the alias names,
so I used "_" instead.

I used MyProcPid which seems more random than MyBackendId (which is
just a number like 1,2,3...). Even with this, someone could argue that
they can look at the backend PID, use it in the materialized view
names just to trick the server. I'm not sure if anyone would want to
do this.

I used the existing function make_temptable_name_n to prepare the
alias names. The advantage of this is that the code looks cleaner, but
it leaks memory, 1KB string for each call of the function. This is
also true with the existing usage of the function. Now, we will have 5
make_temptable_name_n function calls leaking 5KB memory. And we also
have quote_qualified_identifier leaking memory, 2 function calls, 2KB.
So, in total, these two functions will leak 7KB of memory (with the
patch).

Shall I pfree the memory for all the strings returned by the functions
make_temptable_name_n and quote_qualified_identifier? The problem is
that pfree isn't cheaper.
Or shall we leave it as is so that the memory will be freed up by the context?

Note I have not added tests for this, as the code is covered by the
existing tests.

With Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v2-0001-Avoid-alias-name-collisions-in-REFRESH-MAT-VIEW.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-06-01 07:53:41 Re: Skipping logical replication transactions on subscriber side
Previous Message Michael Paquier 2021-06-01 07:34:44 Re: Multiple hosts in connection string failed to failover in non-hot standby mode