Re: Is there a cache consistent interface to tables ?

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Garym <garym(at)oedata(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Is there a cache consistent interface to tables ?
Date: 2018-02-09 14:40:11
Message-ID: CAMsr+YEB3OCo0N9JtkX_4QTFy80p0v0x219AErEG7_6E3we_dQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 February 2018 at 15:56, Garym <garym(at)oedata(dot)com> wrote:

> Hi,
> This is an odd request for help. I'm looking to expose an interface so an
> external app can insert to a table while maintaining cache consistency and
> inserts be promoted via wal.
>
> I need to support about 100k+ inserts/sec from a sensor data stream. It
> simply won't work using sql queries. If the call overhead is too high for
> single calls, multiple records per call is better. The data must be
> available for selects in 500ms. I current only have 24gb ram for pg, but
> production will be 56gb.
>
> I'm taking this approach because pgpool2 chokes, delaying past
> requirements. I initially wanted to use wal, but masters don't want wal in
> feeds and slaves have unpredictable delays of seconds before provisioning
> occurs.
>
>
So you're looking to use Pg in a near-realtime application?

Expect to do some work on latency spikes - scheduling issues, checkpoints,
etc. I strongly advise you to spend some quality time getting faimiliar
with perf, DTrace, systemtap, Linux eBPF tracing (
http://www.brendangregg.com/ebpf.html), or the like. Tuning of kernel
options related to I/O and writeback is likely to be needed, also scheduler
and memory settings.

How sure are you that it's not viable for SQL queries? And, if not, what
makes you think that a lower level interface will help you? Has profiling
and tracing/timing shown that significant time/delays are arising from
layers you can bypass in a sensible way?

You definitely *can* use the heapam and indexam at a lower level to form
tuples and insert into tables, then update the indexes. See genam.c for one
example, but it's optimised for ease of use more than tight performance
AFAIK. You're looking for heap_open, heap_form_tuple, heap_insert, etc.
Beware of index maintenance.

You can probably gain a fair bit with some caching of all the type and
relation oids etc, but of course you must ensure you subscribe to the
necessary relcache/syscache invalidations and act on them appropriately.
See inval.[ch] .

You'll definitely want to batch into txns and use async commit. But beware
the data durability implications.

BDR and pglogical do some of this, you can take a look at them for some
ideas/examples.

Make sure you have a buffering layer that can accumulate rows if there's a
DB failure/outage etc. Otherwise you can never, ever, ever upgrade,
diagnostics and maintenance are harder, etc. Don't fire-and-forget. It can
be a simple producer/consumer that writes sequentially to a collection of
buffer files or whatever.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-02-09 14:42:41 Re: CALL stmt, ERROR: unrecognized node type: 113 bug
Previous Message Robert Haas 2018-02-09 14:36:25 Re: [HACKERS] MERGE SQL Statement for PG11