Re: Can lots of small writes badly hamper reads from other tables?

From: Dave Crooke <dcrooke(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Can lots of small writes badly hamper reads from other tables?
Date: 2012-01-24 21:36:36
Message-ID: CALi4UpgSEhxYkgLEFqxS+ZgtHAibin7siR2re==EVbqjzYT45A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi guys

Thanks for the quick followups folks .... please note I am deliberately
running a setup without commit guarantees, so a lot of the conventional
advice about not doing small writes isn't applicable, but I do want to
understand more about how this affects PG internals even if the I/O is
smoothed out.

By "commit consistency off" I am referring to the setting
"synchronous_commit = off" in postgresql.conf .... IIRC this should mean
semantically that a DB crash may lose data that was reported back to the
app as successfully committed, but will result in a consistent state on
reboot and recovery. In this case IIUC the "120 commits per second per
drive" limit does not apply, and I hope the advice about testing fsync is
similarly not applicable to my case. Also, IIUC that settings like
commit_siblings and commit_delay should be ignored by PG in my case.

I would be interested in learning what the **in-memory** constraints and
costs are on the PG server side of doing a lot of small commits when sync
wrties are *off*, e.g. the implications for the locking system, and whether
this can affect the long queries on the other tables apart from general
resource contention.

The pattern of lots of tiny transactions is semantically correct for the
app, and I am using a JDBC prepared statement on the Java side, which I
believe the PG driver will turn in to a pre-compiled statement with enough
uses (it does NOT do so on the first few hits). This should in theory be
even cheaper than a multiple INSERT VALUES which is all text and has to be
parsed.

However, if necessary for performance I can bundle the inserts into
slightly larger transactions - cases where writes fail are due only to
occasional duplicates (same primary key) coming from upstream and are
pretty rare, and in practice losing a batch of say 100 of these records
occasionally is not a big deal in my world (ignoring sound of cringing DBAs
:) so I could afford to bundle into transactions and then just drop a whole
bundle if any single write has a primary key collision.

Storage setup varies by customer, but a typical setup is to take RAID
groups of about 5-10TB each net from something like an EMC Clariion and
slice each group into 1TB LUNs which become VMWare datastores, which are
written simultaneously from multiple hosts. A mid-size Clariion would host
perhaps 50-100 of these small LUNs, and a customer running a high
performance environment might have Fibrechannel disks and RAID-10, but SATA
and RAID-5/6 would also be normal, albeit with a substantial write-back
cache (maybe 1GB, IIRC a current Clariion SP has 4GB total). Each file on
the datastore corresponds to a virtual disk on a VM, and the datastore is
formatted with VMFS (concurrent writer filesystem, uses SCSI locking to
control access to block allocation and directory entries).

The other type of VMWare datastore works at the filesystem layer - instead
of a shared SAN with iSCSI / FC-AL, the VMware hosts are all pointed at a
shared NFS server directory. NetApp is the popular back end for this
configuration.

On top of this virtualization, I have PG laid out on two virtual disks -
WAL and log files are on the main system partition, index and table data on
a second partition. Both formatted with ext3fs.

One of my larger customers had his SAN guy complain to him that our app was
writing more data to the NetApp it was on than every other app combined, so
I am mindful of the volume being more than some of these systems were
planned for :)

Cheers
Dave

On Tue, Jan 24, 2012 at 3:09 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 24 Leden 2012, 21:16, Dave Crooke wrote:
> > Hi folks
> >
> > This could be a sheer volume issue, but I though I would ask the wisdom
> of
> > this forum as to next investigative steps.
> >
> > ----
> >
> > We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
> > virtual appliance. The bulk of the app's database activity is recording
> > performance data points which arrive in farily large sustained bursts of
> > perhaps 10,000 rows a minute at a medium sized customer, each of which
> are
> > logically separate items and being committed as individual transactions
> > (JDBC auto-commit mode). Our offshore QA team was assigned to track an
> > intermittent issue with speed of some large queries on other tables, and
> > they believe based on correlation the two activities may be contending.
> >
> > The large query is coming off of different tables from the ones being
> > written to ... the raw data goes into a table named by day (partitioning
> > is
> > all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
> > bunch of rollup statements which run hourly to do the aggregations, e.g.
>
> Each storage device has some basic I/O limits - sequential speed
> (read/write)
> and the maximum number or I/O operations it can handle. For example a 7.2k
> drives can do up to 160MB/s sequential reads/writes, but not more than 120
> I/O ops per second. Similarly for other devices - 15k drives can do up to
> 250 I/Os. SSDs can handle much more I/Os, e.g. Intel 320 can handle about
> 8k I/Os.
>
> I have no idea what kind of storage device you're using and what amount of
> sequential and random operations it can handle. But my guess you're hitting
> the limit of random I/Os - each commit requires a fsync, and you're doing
> 10.000 of them per minute, i.e. about 160 per second. If the queries need
> to read data from the drive (e.g. randomly), this just adds more I/Os.
>
> > Is there any tweaking we should do on the PG settings, or on the pattern
> > in
> > which the app is writing - we currently use 10 writer threads on the Java
> > side and they keep PG going pretty good.
>
> The first thing you should do is grouping the inserts to one transaction.
> That'll lower the number of I/Os the database needs to do. Besides that,
> you can move the WAL to a separate (physical) device, thus spreading the
> I/Os to more drives.
>
> > I considered bundling the writes into larger transactions, will that
> > really
> > help much with commit consistency off?
>
> What do you mean by "commit consistency off"?
>
> > Is there some specific "usual suspect" stuff I should look at on the PG
> > side to look for efficiency issues such as index lock contention or a
> poor
> > buffer cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be
> > informative, and if so, does it need to be done while the write load is
> > applied?
>
> The first thing you should do is gathering some basic I/O stats.
>
> Run pg_test_fsync (a contrib module) to see how many fsync operations the
> I/O subsystem can handle (if it reports more than 500, use "-o" to get it
> running for a longer time).
>
> Then gather "vmstat 1" and "iostat -x 1" for a few seconds when the
> workload
> (inserts and queries) are actually running. That should tell you how the
> drives are actually utilized.
>
> Post these results to this list.
>
> > Relevant schema and config attached, all comments and advice welcome,
> > including general tuning tips and rationale for moving to PG 9.x .... I'm
> > well aware this isn't the acme of PG tuning :)
>
> There's a nice page about tuning at the wiki:
>
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>
> I'd recommend significantly increasing the number of checkpoint segments,
> e.g. to 64 (1GB) and setting completion target to 0.9. This usually helps
> write-heavy workloads. And enable log_checkpoints.
>
> Tomas
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2012-01-24 23:46:01 Re: Can lots of small writes badly hamper reads from other tables?
Previous Message Tony Capobianco 2012-01-24 21:34:06 Re: Cursor fetch performance issue