Re: Async IO HTTP server frontend for PostgreSQL

From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Allan Kamau <kamauallan(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Async IO HTTP server frontend for PostgreSQL
Date: 2014-09-11 04:54:16
Message-ID: CAAfz9KP+G4KR6GQp2WTwj1nQGZk8gZke1hLFm9Pp_LxJ6=1gYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello, Allan

2014-09-11 0:29 GMT+04:00 Allan Kamau <kamauallan(at)gmail(dot)com>:

> Dear Dmitriy,
>
> To add on to David's suggestions, Data caching is a difficult task to
> undertake. Consider an example where your data may not all fit into memory,
> when you cache these data outside PostgreSQL you would need to look into
> memory management as well as issues around concurrent population of this
> cache as well as means to keep the data in the cache fresh in tune with any
> changes to the data. These are no trivial tasks and the database community
> has spent years constructing and improving algorithms to do this on behalf
> of the front end database application developer. Also each time a TCP
> connection is created, additional compute resources are consumed by the OS
> as well as the database management server software.
>
Memory - limited resource. Hence, it's possible to catch "out of memory"
everywhere.
By caching in this case I mean flushing the data retrieved by HTTP server
from the
database server on the disk during "socket read-ready event" dispatching if
the
retrieved data cannot be send to the HTTP-client (socket not write-ready
yet). When
the socket to the HTTP-client became "write-ready" the data will be streamed
from cache.
I believe, it's not hard to implement it. And note, there is no need to
make such
cache shared. It's just a temporary files with live time of HTTP request
dispatching.

> A simpler way would be to use connection pooling where a thread of your
> running application "borrows" a connection from a pool of open connections,
> executes the SQL command then returns the connection immediately on
> completion of the SQL command. This will require few concurrent connections
> (depending of configuration) and let the database do the caching of the
> data for you. For effective database data caching may need to make
> adjustments of the PostgreSQL configuration file (postgresql.conf file) as
> well as the operating system resources configuration. This way the response
> time of your client application will degrade gracefully with the increase
> of concurrent client requests.
>
In practice, this approach works. How well? It works. But with
thread/connection pools
your site can be blocked by attacker ("cool-hacker") by initiating many slow
concurrent connections.
Using threads a good only if you're doing a lot of CPU work, rather than
communication work.

> For small number of concurrent connections, the speed advantage direct
> “streaming” solution may have over the traditional connection pooling
> solution may hardly be noticeable to end user. The easier way to increase
> response time is to look into PostgreSQL performance tuning as well as
> investing in faster hardware (mainly the the disk subsystem and more RAM).
>
Yes, I agree. Good and fast hardware is always good :-). But every hardware
will has limited resources anyway. And I believe, that for many tasks async
solution will help to leverage it to the max.

> Regards,
>
> Allan.
>
>
>
>
> On Wed, Sep 10, 2014 at 8:25 PM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
> wrote:
>
>> Hello, Steve
>>
>> 2014-09-10 21:08 GMT+04:00 Steve Atkins <steve(at)blighty(dot)com>:
>>
>>>
>>> On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
>>> wrote:
>>>
>>> > Hello, David
>>> >
>>> > 2014-09-10 4:31 GMT+04:00 David Boreham <david_list(at)boreham(dot)org>:
>>> > Hi Dmitriy, are you able to say a little about what's driving your
>>> quest for async http-to-pg ?
>>> > I'm curious as to the motivations, and whether they match up with some
>>> of my own reasons for wanting to use low-thread-count solutions.
>>> > For many web projects I consider Postgres as a development platform.
>>> Thus,
>>> > I prefer to keep the business logic (data integrity trigger functions
>>> and
>>> > API functions) in the database. Because of nature of the Web, many
>>> concurrent
>>> > clients can request a site and I want to serve maximum possible of
>>> them with
>>> > minimal overhead. Also I want to avoid a complex solutions. So, I
>>> believe that
>>> > with asynchronous solution it's possible to *stream* the data from the
>>> database
>>> > to the maximum number of clients (which possible can request my site
>>> over a
>>> > slow connection).
>>>
>>> That's going to require you to have one database connection open for each
>>> client. If the client is over a slow connection it'll keep the database
>>> connection
>>> open far longer than is needed, (compared to the usual "pull data from
>>> the
>>> database as fast as the disks will go, then spoonfeed it out to the slow
>>> client"
>>> approach). Requiring a live database backend for every open client
>>> connection
>>> doesn't seem like a good idea if you're supporting many slow concurrent
>>> clients.
>>>
>> Good point. Thus, some of caching on the HTTP server side should be
>> implemented
>> then.
>>
>>
>> --
>> // Dmitriy.
>>
>>
>

--
// Dmitriy.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Rémi Cura 2014-09-11 06:40:21 Re: PostgreSQL Portable
Previous Message Huang, Suya 2014-09-11 04:26:27 Re: Decreasing performance in table partitioning