Re: Can lots of small writes badly hamper reads from other tables?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Dave Crooke <dcrooke(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Can lots of small writes badly hamper reads from other tables?
Date: 2012-01-25 16:19:21
Message-ID: CAMkU=1wHnkOd0hnE7ukA32GGSBV07LV1LSN1scK=LvivELh+qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jan 24, 2012 at 12:16 PM, Dave Crooke <dcrooke(at)gmail(dot)com> wrote:
> Hi folks
>
> This could be a sheer volume issue, but I though I would ask the wisdom of
> this forum as to next investigative steps.

To answers the question in your subject, yes. If the disk head is
positioned to write in one place, it can't be reading from some other
place. The various levels of caches and re-ordering and other tricks
can improve the situation, but they have a finite capacity to do so.

> We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
> virtual appliance. The bulk of the app's database activity is recording
> performance data points which arrive in farily large sustained bursts of
> perhaps 10,000 rows a minute at a medium sized customer, each of which are
> logically separate items and being committed as individual transactions
> (JDBC auto-commit mode). Our offshore QA team was assigned to track an
> intermittent issue with speed of some large queries on other tables, and
> they believe based on correlation the two activities may be contending.
>
> The large query is coming off of different tables from the ones being
> written to ... the raw data goes into a table named by day (partitioning is
> all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
> bunch of rollup statements which run hourly to do the aggregations, e.g.

In your attached schema there are two perf_raw tables, and they have
different sets of indexes on them.
Which set is in operation during the inserts?

> insert into PERF_HOURLY_2012_01_24 select key_columns, avg(data), now() from
> perf_raw_2012_01_24 where time_stamp between (now() - interval '1 hour') and
> now() group by key_columns
>
> The big queries are hitting multiple of the PERF_HOURLY tables and pulling a
> few dozen rows from each.

How big are they those big queries, really? A few dozen tables times
a few dozen rows?

...
>
> Is there any tweaking we should do on the PG settings, or on the pattern in
> which the app is writing - we currently use 10 writer threads on the Java
> side and they keep PG going pretty good.

Do you need 10 writer threads? What happens if you use fewer?

>
> I considered bundling the writes into larger transactions, will that really
> help much with commit consistency off?

With synchronous_commit=off, I wouldn't expect the transaction
structure to make much difference. Especially not if the target of
the mass inserts is indexed.

> Is there some specific "usual suspect" stuff I should look at on the PG side
> to look for efficiency issues such as index lock contention or a poor buffer
> cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be informative,
> and if so, does it need to be done while the write load is applied?

EXPLAIN would probably help, EXPLAIN ANALYSE while the problem is in
action would help more.

Even better would be to see where the queries are blocking during the
problem, but there is no easy way to get that in postgres. I'd strace
-ttt -T the query process (although the mere act of stracing it can
slow it down enough to relieve the bottleneck you are trying to
identify)

>
> The other whacky idea I had was to have the writer threads pause or throttle
> themselves when a big query is happening (it's all in one JVM and we are
> using a connection pooler, so it's easy to intercept and track if needed)
> however that strikes me as a rather ugly hack and I'd prefer to do something
> more robust and based on config tweaks that leverage existing resource
> management in PG.

Why not just always throttle them? If you slam the data in as fast as
possible during brief bursts, you are probably just setting yourself
up for this type of issue. (The brief bursts can be useful if they
make better use of cache, but then you have to accept that other
things will be disrupted during those bursts.)

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2012-01-25 20:16:51 Re: PostgreSQL Parallel Processing !
Previous Message sridhar bamandlapally 2012-01-25 16:18:43 Re: PostgreSQL Parallel Processing !