Re: pgindent (was Re: [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

From: Piotr Stefaniak <postgres(at)piotr-stefaniak(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgindent (was Re: [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)
Date: 2017-05-21 05:36:31
Message-ID: VI1PR03MB119920FF3A912DEA757E2270F2FB0@VI1PR03MB1199.eurprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 2017-05-21 03:00, Tom Lane wrote:
> I wrote:
>> Also, I found two places where an overlength comment line is simply busted
>> altogether --- notice that a character is missing at the split point:
>
> I found the cause of that: you need to apply this patch:
>
> --- freebsd_indent/pr_comment.c~ 2017-05-17 14:59:31.548442801 -0400
> +++ freebsd_indent/pr_comment.c 2017-05-20 20:51:16.447332977 -0400
> @@ -344,8 +353,8 @@ pr_comment(void)
> {
> int len = strlen(t_ptr);
>
> - CHECK_SIZE_COM(len);
> - memmove(e_com, t_ptr, len);
> + CHECK_SIZE_COM(len + 1);
> + memmove(e_com, t_ptr, len + 1);
> last_bl = strpbrk(e_com, " \t");
> e_com += len;
> }
>
> As the code stands, the strpbrk call is being applied to a
> not-null-terminated string and therefore is sometimes producing an
> insane value of last_bl, messing up decisions later in the comment.
> Having the memmove include the trailing \0 resolves that.

I have been analyzing this and came to different conclusions. Foremost,
a strpbrk() call like that finds the first occurrence of either space or
a tab, but last_bl means "last blank" - it's used for marking where to
wrap a comment line if it turns out to be too long. The previous coding
moved the character sequence byte after byte, updating last_bl every
time it was copying one of the two characters. I've rewritten that part as:
CHECK_SIZE_COM(len);
memmove(e_com, t_ptr, len);
- last_bl = strpbrk(e_com, " \t");
e_com += len;
+ last_bl = NULL;
+ for (t_ptr = e_com - 1; t_ptr > e_com - len; t_ptr--)
+ if (*t_ptr == ' ' || *t_ptr == '\t') {
+ last_bl = t_ptr;
+ break;
+ }
}

But then I also started to wonder if there is any case when there's more
than one character to copy and I haven't found one yet. It looks like
} while (!memchr("*\n\r\b\t", *buf_ptr, 6) &&
(now_col <= adj_max_col || !last_bl));
guarantees that if we're past adj_max_col, it'll only be one non-space
character. But I'm not sure yet.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-05-21 14:12:20 Re: pgindent (was Re: [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)
Previous Message Tom Lane 2017-05-21 01:51:15 pgsql: Change documentation references to PG website to use https: not

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-05-21 06:19:34 Re: Race conditions with WAL sender PID lookups
Previous Message Pavel Stehule 2017-05-21 05:30:14 Re: Variable substitution in psql backtick expansion