Re: COPY-able csv log outputs

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY-able csv log outputs
Date: 2007-06-06 19:14:19
Message-ID: 4667078B.6020705@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Greg Smith wrote:
> The attached patch fixes all the issues I found in the original
> version of this code and completes the review I wanted to do. Someone
> else will need to take this from here. As I already mentioned, I
> can't comment on the quality of the piping implementation used to add
> this feature other than to say it worked for me.

The code below strikes me as being seriously broken.

The comment says that it doubles all single quotes, double quotes and
backslashes. It doesn't (backslashes are not doubled) and in any case
doing that would simply be wrong (with or without backslashes). The ONLY
things that should be doubled are double quotes. Period.

That part is easy to fix.

But here's my question: why are we worrying at all about things like the
encoding? Especially, why are we worrying about the *client* encoding
for a server log file? We surely aren't going to switch encodings in the
middle of a log file! ISTM that this routine should simply copy the
input, byte for byte, apart from doubling the dquotes. Does that make
sense? I assume that this routine has been copied from somewhere else
without sufficient regard to the context (or the logic).

The code also illustrates something else that annoyed me elsewhere in
the patch and that I have eliminated, namely use of a macro placed in
the header file and then used in exactly one place in the code. The
macro didn't make anything clearer - quite the reverse. ISTM that a
macro used only in one file should be defined in that file, generally,
and if it's used only once is probably not much use anyway.

cheers

andrew

> +
> + /*
> + * Escapes special characters in the string to conform
> + * with the csv type output.
> + * Replaces " with "", ' with '' and \ with \\.
> + */
> + static size_t
> + escape_string_literal(char *to, const char *from)
> + {
> + const char *source = from;
> + char *target = to;
> + size_t remaining = 0;
> + int client_encoding = 0;
> +
> + if (from == NULL)
> + return remaining;
> +
> + remaining = strlen(from);
> + client_encoding = pg_get_client_encoding();
> +
> + while (remaining > 0 && *source != '\0')
> + {
> + char c = *source;
> + int len;
> + int i;
> +
> + /* Fast path for plain ASCII */
> + if (!IS_HIGHBIT_SET(c))
> + {
> + /* Apply quoting if needed */
> + if (CSV_STR_DOUBLE(c, false))
> + *target++ = c;
> + /* Copy the character */
> + *target++ = c;
> + source++;
> + remaining--;
> + continue;
> + }
> +
> + /* Slow path for possible multibyte characters */
> + len = pg_encoding_mblen(client_encoding, source);
> +
> + /* Copy the character */
> + for (i = 0; i < len; i++)
> + {
> + if (remaining == 0 || *source == '\0')
> + break;
> + *target++ = *source++;
> + remaining--;
> + }
> +
> + /*
> + * If we hit premature end of string (ie, incomplete multibyte
> + * character), try to pad out to the correct length with spaces.
> + * We may not be able to pad completely, but we will always be
> + * able to insert at least one pad space (since we'd not have
> + * quoted a multibyte character). This should be enough to make
> + * a string that the server will error out on.
> + */
> + if (i < len)
> + {
> + for (; i < len; i++)
> + {
> + if (((size_t) (target - to)) / 2 >= strlen(from))
> + break;
> + *target++ = ' ';
> + }
> + break;
> + }
> + }
> +
> + /* Write the terminating NUL character. */
> + *target = '\0';
> +
> + return target - to;
> + }
>
>

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2007-06-06 20:03:05 WIP csv logs
Previous Message Hannes Eder 2007-06-06 19:12:53 Re: msvc, build and install with cygwin in the PATH