Re: Fetch zero result rows when executing a query?

From: Shay Rojansky <roji(at)roji(dot)org>
To: pgsql-hackers(at)postgresql(dot)org, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Subject: Re: Fetch zero result rows when executing a query?
Date: 2015-02-08 08:56:17
Message-ID: CADT4RqBdKTWR8gmpO509Bwonhxfu3RU=Kp1aRz39kwkR8BkFow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First a general comment:

> Then the driver writers that need these special API behaviors are
> reasonably expected to contribute to adding them to backend products that
> do not already have them. The database developers are not going to take
on
> responsibility for the API decisions of others; and features deemed (or
> that in reality are) of marginal usefulness are likely to be omitted -
> intentionally or otherwise - from the official (in this case libpq)
> protocol.

I absolutely agree with you there, I'm not trying to get anybody to
implement
something I need (i.e. fetch 0 rows). This is more of a general discussion
as
to whether that feature *makes sense* to you as a protocol feature (which
doesn't
seem to be the case, as some of you guys want to deprecate the whole
max_rows
thing).

>> I'll clarify just a little... I am indeed talking about the PostgreSQL
>> network protocol, and not about query optimization (with LIMIT or
omitting
>> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
>> through which the user indicates they're not interested in any result
rows
>> whether those exist or not.

> ExecuteNonQuery returns an integer while row-returning queries do not.
> I'd argue that the API states that the user is declaring that the query
> they are executing does not return any actual rows - just a count of
> affected rows - not that they do not care to see what rows are returned.

That's true. IMHO the count of affected rows isn't relevant to this
discussion
so I didn't mention it.

>> For the situation where a user does ExecuteNonQuery but the query returns
>> result rows, the driver can save the needless network transfers. We can
>> definitely say it's the user's fault for providing a query with a
resultset
>> to ExecuteNonQuery, but we *do* have the user's clear intention that no
>> rows be fetched so why not act on it. I agree this isn't a terribly
>> important optimization, the trigger for this question was first and
>> foremost curiosity: it seems strange the protocol allows you to specify
>> max_rows for any value other than 0.

> Yes, it does seem strange and, like Marko said, ideally would be
> deprecated. The fact that it cannot handle "zero rows" seems like an
> unnecessary limitation and I cannot image that any values other than 0 and
> all would be of practical usage. In the case of zero returning instead
the
> number of rows would be more useful than simply refusing to return
anything
> so even if something like this is needed the current implementation is
> flawed.

Just to be precise: what is strange to me is that the max_rows feature
exists
but has no 0 value. You and Marko are arguing that the whole feature should
be
deprecated (i.e. always return all rows).

>> Here's a possible believable use-case which doesn't involve user neglect:
>> imagine some server-side function which has side-effects and also returns
>> some rows. In some situations the user is interested in the result rows,
>> but in others they only want the side-effect. The user would probably
have
>> no control over the function, and their only way to *not* transfer the
>> result rows would be with a mechanism such as max_rows.

> Functions always return rows and so should not be executed using
> "ExecuteNonQuery". In most cases action-oriented functions return a
single
> result-status row so ignoring that row, while likely not advisable, is not
> exactly expensive.

Your description of functions doesn't hold for all functions, this is why I
tried to provide a usecase. It is possible for some function to both have a
side-effect (i.e. modify some table) *and* return a large number of rows. It
may be legitimate for a user to want to have the side-effect but not care
about the rows. Ignoring one row isn't expensive, ignoring many could be.

> The basic question here becomes - the executor already must generate, in
> memory, all of the rows so is there a way to properly interact with the
> server where you can request the number of rows that were generated but
not
> be obligated to actually pull them down to the client. This doesn't seem
> like an unreasonable request but assuming that it is not currently
possible
> (of which I have little clue) then the question becomes who cares enough
to
> design and implement such a protocol enhancement.

OK.

>> More to the point, doesn't max_rows=1 have exactly the same dangers as
>> LIMIT 1? The two seem to be identical, except that one is expressed in
the
>> SQL query and the other at the network protocol level.

> The planner does not have access to network protocol level? options while
> it does know about LIMIT.

That's an internal PostgreSQL matter (which granted, may impact efficiency).
My comment about max_rows being equivalent to LIMIT was meant to address
Marko's
argument that max_rows is dangerous because any row might come out and tests
may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

> Expecting users to use an API without knowledge or control of the SQL that
> is being executed seems like a stretch to me. Expecting the driver to
> simply provide an easy way to access data from the common SQL idioms a
user
> might use seems like a reasonable goal and puts leaves the smarts in the
> purvue of the planner. The optimization you require doesn't seem
> unreasonable but also doesn't seem especially compelling - nor matter how
> many people might be using ADO.NET (which provides no indication that they
> are trying to use APIs that are incompatible with the queries that they
are
> sending.

Fair enough, I don't disagree with the above. The idea is less "no
knowledge or
control of the SQL", but rather a complicated prepared statement that is
executed
in some places to fetch all rows, and in others to fetch only 1 (maybe this
is the
most compelling usecase).

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2015-02-08 09:17:27 Re: Fetch zero result rows when executing a query?
Previous Message Michael Paquier 2015-02-08 05:54:16 Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code