[POC] FETCH limited by bytes.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: [POC] FETCH limited by bytes.
Date: 2015-01-22 10:27:39
Message-ID: 20150122.192739.164180273.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

==

Postgres_fdw fetches tuples from remote servers using cursor. The
transfer gets faster as the number of fetch decreases. On the
other hand buffer size for the fetched tuples widely varies
according to their average length. 100 tuples per fetch is quite
small for short tuples but larger fetch size will easily cause
memory exhaustion. However, there's no way to know it in advance.

One means to settle the contradiction would be a FETCH which
sends result limiting by size, not the number of tuples. So I'd
like to propose this.

This patch is a POC for the feature. For exapmle,

FETCH 10000 LIMIT 1000000 FROM c1;

This FETCH retrieves up to 10000 tuples but cut out just after
the total tuple length exceeds 1MB. (It does not literally
"LIMIT" in that sense)

The syntax added by this patch is described as following.

FETCH [FORWARD|BACKWARD] <ALL|SignedIconst> LIMIT Iconst [FROM|IN] curname

The "data size" to be compared with the LIMIT size is the
summation of the result of the following expression. The
appropriateness of it should be arguable.

[if tupleslot has tts_tuple]
HEAPTUPLESIZE + slot->tts_tuple->t_len
[else]
HEAPTUPLESIZE +
heap_compute_data_size(slot->tts_tupleDescriptor,
slot->tts_values,
slot->tts_isnull);

========================

This patch does following changes,

- This patch adds the parameter "size" to following functions
(standard_)ExecutorRun / ExecutePlan / RunFromStore
PortalRun / PortalRunSelect / PortalRunFetch / DoPortalRunFetch

- The core is in StandardExecutorRun and RunFromStore. Simplly
sum up the sent tuple length and compare against the given
limit.

- struct FetchStmt and EState has new member.

- The modifications in gram.y affects on ecpg parser. I think I
could fix them but with no confidence :(

- Modified the corespondence parts of the changes above in
auto_explain and pg_stat_statments only in parameter list.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Size-limitation-feature-of-FETCH-v0.patch text/x-patch 31.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-01-22 10:57:49 Re: Parallel Seq Scan
Previous Message Andres Freund 2015-01-22 07:32:48 Re: Better way of dealing with pgstat wait timeout during buildfarm runs?