| From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
|---|---|
| To: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> |
| Cc: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add LIMIT option to COPY FROM |
| Date: | 2026-02-04 01:46:22 |
| Message-ID: | CAKFQuwbKtCOPSY3XRzTighucPXzat6qM9g937KPoOCYP=10rww@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Feb 3, 2026 at 6:07 PM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
> > Would want it paired with offset for this use case.
>
> COPY FROM already accepts HEADER <integer> to skip N leading lines, so
> you can combine it with LIMIT to get OFFSET + LIMIT semantics today:
> =# COPY t FROM 'file.csv' (HEADER 100, LIMIT 50);
>
Indirect, but fair, assuming the limit is indeed pre-parsed lines. You
need OFFSET to match post-processed counted lines.
> >> Design:
> >> - The LIMIT count applies after WHERE filtering and ON_ERROR skipping,
> >> so it represents the actual number of rows inserted.
> >
> >
> > Not sure about that choice. I’d go with pre-eval or implement both and
> default to pre-eval.
>
> It is consistent with SQL semantics — SELECT ... WHERE ... LIMIT N
> counts rows that pass the filter, not rows scanned.
Ok, I suppose I'd name it "stop N" instead of "limit N" to avoid this.
Pre-eval behavior
> is already achievable externally (head -n), while post-eval can only
> be done server-side, which makes it the more valuable choice for a
> built-in option.
>
Not seeing the need for either-or; and everywhere else you want to support
this feature by assuming that external tools aren't available.
So; HEADER+STOP, HEADER?+OFFSET+LIMIT
> This is not a novel feature either. Oracle SQL*Loader provides LOAD,
> and SQL Server BULK INSERT provides LASTROW. The absence of LIMIT in
> COPY FROM is arguably a gap relative to other RDBMSes.
>
>
There'd be a lot less friction of this sort if someone just bites the
bullet and devises a core-managed ETL tool instead of attaching pieces
one-by-one into COPY. Or at least maybe we get a fast-path version to
handle typical dump-restore commands and then branch to the ETL path if the
command options indicate doing so is needed. There is apparently too much
demand for this stuff for a nonproliferation agreement.
David J.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | zengman | 2026-02-04 01:50:52 | Re: Remove unused isCommit parameter from AtEOXact_LocalBuffers |
| Previous Message | Shinya Kato | 2026-02-04 01:06:28 | Re: Add LIMIT option to COPY FROM |