Re: [Proposal] Adding TRIM_SPACE option to COPY

From: 河田達也 <kawatatatsuya0913(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Proposal] Adding TRIM_SPACE option to COPY
Date: 2025-11-26 16:00:41
Message-ID: CAHza6qfffimVp=+SOr533JHXMazNE6F6UFhxE0HjzQu9JE5-EA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Thank you for the feedback and for taking the time to review this patch.

## Performance Measurement Results
> I also agree that this feature probably won't add noticeable overhead to
> COPY when it isn't used, but it would still be good to measure
> the performance impact of the patch.
I conducted simple performance measurements with 5 million rows to assess
the overhead
when TRIM_SPACE is not used:

**BEFORE patch (master branch):**
Time: 5565.584 ms (00:05.566)
Time: 5626.593 ms (00:05.627)

**AFTER patch (TRIM_SPACE disabled):**
Time: 5840.472 ms (00:05.840)
Time: 5523.806 ms (00:05.524)

The overhead appears to be approximately 1.5%, which seems to be within
an acceptable range for typical COPY operations.

## Regarding COPY PROGRAM as an Alternative

I understand Tom's concern about adding features to core that could be
implemented externally.

> As for TRIM_SPACE, it seems possible to implement it as an external module
> and call it via COPY PROGRAM. Is this true?
You're right that TRIM_SPACE can technically be
achieved using COPY PROGRAM with an external script. I'd like to share
some observations about this approach:

**Example implementation with COPY PROGRAM:**

```python
#!/usr/bin/env python3
import sys
import csv

reader = csv.reader(sys.stdin)
writer = csv.writer(sys.stdout)

for row in reader:
trimmed_row = [field.strip() for field in row]
writer.writerow(trimmed_row)
```

```sql
COPY users FROM PROGRAM 'python3 /tmp/trim_csv.py'
WITH (FORMAT csv, HEADER true);
```

**Some considerations:**

While the COPY PROGRAM approach works, it does have some practical
limitations:
it requires elevated privileges (`pg_execute_server_program`), may not be
available on all platforms (particularly Windows), and requires users to
handle parsing correctly. That said, I understand the concern about
where to draw the line for features in core, especially given that options
like FORCE_NULL follow a similar pattern.

## My Thoughts

> So the question is which ones belong in core. On second thought,
> perhaps features that are difficult or impossible to implement outside
the core
> are the ones that can be considered for COPY itself. Otherwise, it might
be
> better to avoid expanding COPY unnecessarily. Anyway I'd like to hear more
> opinons on this.
I can see the validity of the concern about feature creep.
I'm open to the community's perspective on whether this
belongs in core.

If the general feeling is that this functionality should remain external,
I'm happy to withdraw the patch.

Thank you again for your time and valuable feedback. This discussion has
been
very educational for me.

Best regards,
Tatsuya Kawata

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-11-26 16:02:04 Re: [buildfarm related] Machines gcc experimental failed test_lfind
Previous Message Tom Lane 2025-11-26 15:51:12 Re: [buildfarm related] Machines gcc experimental failed test_lfind