Re: RFC: Async query processing

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)redhat(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: Async query processing
Date: 2014-01-03 17:22:46
Message-ID: CAHyXU0yZ6+uy1b+pw_tZg5TbBxq2S+nNPb1RRgBRY+LqCTFuew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
>>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer(at)redhat(dot)com> wrote:
>>>> Loading data into the database isn't such an uncommon task. Not everything
>>>> is OLTP.
>>
>>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next. Which
>> eliminates a whole lot of apps. I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading. It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
> You're forgetting ORM workloads.
>
> ORMs can usually plan the inserts to be in a sequence that both don't
> require feedback (except the knowledge that they were successful), and
> that do not violate constraints.
>
> Flushing a whole object hierarchy for instance, can be done without
> feedback. Not even serial columns need feedback, since many ORMs
> (SQLAlchemy, Hibernate) support allocation of ID sequences in batches
> (by issuing a proper select nextval).
>
> I agree, that with the proposed API, it's too error prone to be
> useful. But I also think, if the API is simple and fool-proof enough,
> it could be "build them and they will come". I know I'll be happy to
> implement support for SQLAlchemy (since it will benefit me), if the
> API resembles the proposition below (at least in simplicity).
>
> Per-query expectations could be such a thing. And it can even work with PQexec:
>
> PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> ... 9 times...
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
> do {
> // do something useful
> } while (PQflush());
>
> Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
> and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
> send if the buffer is full). After any ASYNC call, PQflush would be
> necessary (to flush the send queue and to consume the expected
> responses), but I can imagine any synchronous call (PQexec,
> PQsendQuery or whatever) could detect a non-empty buffer and just
> blockingly flush right there.
>
> This can benefit many useful patterns. ORM flush, is one, if there can
> be preallocation of IDs (which I know at least SQLAlchemy and
> Hibernate both support).
>
> Execute-many of prepared statements is another one, quite common.
>
> I'm not sure what would happen if one of the queries returned an
> error. If in a transaction, all the following queries would error out
> I'd imagine. If not, they would simply be executed blindly.. am I
> correct?

Long term, I'd rather see an optimized 'ORM flush' assemble the data
into a structured data set (perhaps a JSON document) and pass it to
some receiving routine that decomposed it into records. This is a
better way to so things on so many levels. Maybe I'm an old cranky
guy yelling at pigeons, but I don't think the current approach that
many ORMs take is going to withstand the test of time.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-01-03 17:53:49 Re: ISN extension bug? (with patch)
Previous Message Claudio Freire 2014-01-03 17:06:11 Re: RFC: Async query processing