Re: [PATCH] - Provide robust alternatives for replace_string

From: Asim Praveen <pasim(at)vmware(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Georgios <gkokolatos(at)protonmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] - Provide robust alternatives for replace_string
Date: 2020-08-04 09:22:40
Message-ID: 9BA7ADA4-5113-4EA3-8D35-D6A21595F170@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 03-Aug-2020, at 8:36 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> On 2020-Aug-03, Asim Praveen wrote:
>
>> Thank you Alvaro for reviewing the patch!
>>
>>> On 01-Aug-2020, at 7:22 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>>>
>>> What happens if a replacement string happens to be split in the middle
>>> by the fgets buffering? I think it'll fail to be replaced. This
>>> applies to both versions.
>>
>> Can a string to be replaced be split across multiple lines in the source file? If I understand correctly, fgets reads one line from input file at a time. If I do not, in the worst case, we will get an un-replaced string in the output, such as “(at)abs_dir@“ and it should be easily detected by a failing diff.
>
> I meant what if the line is longer than 1023 chars and the replace
> marker starts at byte 1021, for example. Then the first fgets would get
> "@ab" and the second fgets would get "s_dir@" and none would see it as
> replaceable.

Thanks for the patient explanation, I had missed the obvious. To keep the code simple, I’m in favour of relying on the diff of a failing test to catch the split-replacement string problem.

>
>>> In the stringinfo version it seemed to me that using pnstrdup is
>>> possible to avoid copying trailing bytes.
>>
>> That’s a good suggestion. Using pnstrdup would look like this:
>>
>> --- a/src/test/regress/pg_regress.c
>> +++ b/src/test/regress/pg_regress.c
>> @@ -465,7 +465,7 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme
>>
>> while ((ptr = strstr(string->data, replace)) != NULL)
>> {
>> - char *dup = pg_strdup(string->data);
>> + char *dup = pnstrdup(string->data, string->maxlen);
>
> I was thinking pnstrdup(string->data, ptr - string->data) to avoid
> copying the chars beyond ptr.
>

In fact, what we need in the dup are chars beyond ptr. Copying of characters prefixing the string to be replaced can be avoided, like so:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -465,12 +465,12 @@ replace_stringInfo(StringInfo string, const char *replace, const char *replaceme

while ((ptr = strstr(string->data, replace)) != NULL)
{
- char *dup = pg_strdup(string->data);
+ char *suffix = pnstrdup(ptr + strlen(replace), string->maxlen);
size_t pos = ptr - string->data;

string->len = pos;
appendStringInfoString(string, replacement);
- appendStringInfoString(string, dup + pos + strlen(replace));
+ appendStringInfoString(string, suffix);

- free(dup);
+ free(suffix);
}
}

Asim

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Kellerer 2020-08-04 09:24:11 Re: EDB builds Postgres 13 with an obsolete ICU version
Previous Message David Rowley 2020-08-04 09:06:16 Re: [PATCH v1] elog.c: Remove special case which avoided %*s format strings..