Re: RFC: Async query processing

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Weimer <fweimer(at)redhat(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: Async query processing
Date: 2014-01-03 17:06:11
Message-ID: CAGTBQpahBeCjUnrA=vj_8w=fGSKejGcUMitArZa+uuFKJ3LHOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer(at)redhat(dot)com> wrote:
>>> Loading data into the database isn't such an uncommon task. Not everything
>>> is OLTP.
>
>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>
> I think Florian has a good point there, and the reason is this: what
> you are talking about will be of exactly zero use to applications that
> want to see the results of one query before launching the next. Which
> eliminates a whole lot of apps. I suspect that almost the *only*
> common use case in which a stream of queries can be launched without
> feedback is going to be bulk data loading. It's not clear at all
> that pipelining the PQexec code path is the way to better performance
> for that --- why not use COPY, instead?

You're forgetting ORM workloads.

ORMs can usually plan the inserts to be in a sequence that both don't
require feedback (except the knowledge that they were successful), and
that do not violate constraints.

Flushing a whole object hierarchy for instance, can be done without
feedback. Not even serial columns need feedback, since many ORMs
(SQLAlchemy, Hibernate) support allocation of ID sequences in batches
(by issuing a proper select nextval).

I agree, that with the proposed API, it's too error prone to be
useful. But I also think, if the API is simple and fool-proof enough,
it could be "build them and they will come". I know I'll be happy to
implement support for SQLAlchemy (since it will benefit me), if the
API resembles the proposition below (at least in simplicity).

Per-query expectations could be such a thing. And it can even work with PQexec:

PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
--read--
PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
--read--
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
... 9 times...
PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
do {
// do something useful
} while (PQflush());

Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
send if the buffer is full). After any ASYNC call, PQflush would be
necessary (to flush the send queue and to consume the expected
responses), but I can imagine any synchronous call (PQexec,
PQsendQuery or whatever) could detect a non-empty buffer and just
blockingly flush right there.

This can benefit many useful patterns. ORM flush, is one, if there can
be preallocation of IDs (which I know at least SQLAlchemy and
Hibernate both support).

Execute-many of prepared statements is another one, quite common.

I'm not sure what would happen if one of the queries returned an
error. If in a transaction, all the following queries would error out
I'd imagine. If not, they would simply be executed blindly.. am I
correct?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2014-01-03 17:22:46 Re: RFC: Async query processing
Previous Message Merlin Moncure 2014-01-03 16:31:39 Re: RFC: Async query processing