Re: Guidance Requested - Bulk Inserting + Queries

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Benjamin Johnson <benjamin(dot)johnson(at)getcarbonblack(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Guidance Requested - Bulk Inserting + Queries
Date: 2011-12-22 08:04:30
Message-ID: CAMkU=1yfr+6dn3jxYtZxEAN+dMYxhbYodYqVZ6WgAi3QSUYz+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Dec 21, 2011 at 6:30 PM, Benjamin Johnson
<benjamin(dot)johnson(at)getcarbonblack(dot)com> wrote:
> Jeff,
>
> Sorry for the delayed response.  Please see (some) answers inline.
>
> On 12/1/2011 9:06 AM, Jeff Janes wrote:
>> On Wed, Nov 30, 2011 at 7:27 AM, Benjamin Johnson

>> Why shove it in as fast as you can?  If you want to both read and
>> write at the same time, then focusing first only on writing and
>> worrying about reading as an after thought seems like the wrong thing
>> to do.

> You're probably correct in that we need to think about the entire system
> as a whole.  We're concerned with getting the data
> from our host-based to our server where it is processed and stored.
> Because our system is essentially a giant logging service for
> your enterprise, most of the time we're collecting data and writing it.
> The main times it will be read is when some security incident
> occurs, but between those we expect it to be very write heavy.

OK, that does have an interesting flavor, in which the typical row
will be queried about read zero times, but you can't predict in
advance which ones are more likely to ever be queried.

Do you know how necessary all your indexes are for supporting the
queries? If the queries are relatively rare, maybe you could support
them simply with seq scans on unindexed tables/partitions, at least on
the leading edge partitions.

>> How many hours worth of data can be loaded into the new partition
>> before the performance knee hits?
> In simulations, if I try to simulate the amount of data a large customer
> would send, then it is just about an hour worth of data before the indexes
> get to be several gigabytes in size and performance really goes downhill
> -- the "knee" if you will.

So having hourly partitions with live indexes might be cutting it
pretty close. Once something pushes you over the edge into degraded
performance, you would never be able to recover.

>> After the knee, how does the random disk read activity you see compare
>> to the maximum random disk reads your IO system can support?  How many
>> COPYs were you doing at the same time?
> I don't have exact statistics, but we had 4 writer threads all doing
> copy into 4 tables as fast as they receive data.

Are they receiving data at the rate they would naturally? I.e. does
it take an hour to simulate an hour's worth of data?

If they go into different tables, then they are going into different
indices and so are all competing with each other for cache space for
the index leaf blocks
(rather than sharing that cache space as they might possibly if they
were going into the same table). So you run out of cache space and
your performance collapses at one forth the total size as if you made
them take turns. Of course if you make them take turns, you have to
either throttle or buffer their data retrieval. Also there is a
question of how often you would have to rotate turns, and how long it
would take to exchange out the buffers upon a turn rotation. (There
are stupid OS tricks you can pull outside of PG to help that process
along, but trying to coordinate that would be painful.)

> The system is very much NOT ideal -- Windows 7 Developer-Class
> Workstation with (one) 7200 RPM Harddrive.  I want to find bottlebecks
> in this
> system and then see what real servers can handle.  (We're a small
> company and only now are starting to be able to invest in dev/test servers.

I think you said that for loading into large-grained partitions with
live indexes, the bottleneck was the random reads needed to pull in
the leaf blocks. In that case, if you change to RAID with striping
you should be able to scale with the effective number of spindles,
provided you have enough parallel copies going on to keep each spindle
busy with its own random read. Of course those parallel copies would
make the RAM issues worse, but by saying large-grained partitions I
mean that you've already given up on the notion having the indices fit
in RAM, so at that point you might as well get the spindle-scaling.

...

>
> We managed to sort of get around the issue by having hourly tables
> inherit from our daily tables.  This makes our indexes smaller and the
> writes in our tests don't
> seem to hit this same limit (at least so far.)  I have a couple
> follow-up questions:
>
> 1) Would it be acceptable to have let's say 60 daily partitions and then
> each of those has 24 hourly partitions?

It sounds like each client gets their own hardware, but of each client
can have several thousand customers, how is that handled? All dumped
into one giant partitioned (on time) table, or does each customer get
their own table? 60*24*thousands would certainly add up! If it is
just 60*24, it will certainly slow down your queries (the ones not
using constraint exclusion anyway) some as it has to do a look up in
1440 btrees for each query, but if queries are fast enough then they
are fast enough. It should be pretty easy to test, if you know the
types of queries you will be seeing.

> Would it be better to after a
> day or two (so that data is now old and mostly unchanged) "rollup" the
> hourly tables into their respective daily table and then remove the
> hourly tables?

That would generate an awful lot of extraneous IO (mostly sequential
rather than random, so more efficient, but still IO) which is going to
compete with the rest of the IO going on, in order to solve a problem
that you don't yet know that you have.

>
> 2) Some of our indexes are on an identifier that is a hash of some event
> attributes, so it's basically a random BIGINT.  We believe part of the
> problem is that each row could be in an entirely different location in
> the index thus causing lots of seeking and thrashing.  Would doing
> something like having our index become a multi-column index by doing
> (event_timestamp, originally_index_column) be better so that they closer
> in proximity to other events coming in around the same time?  I have to
> admit that I don't really know how indexes are stored / paged.

What if you just drop this index but keep the others while loading?
If dropping just that index has a big effect, then changing it as you
describe would almost certainly help on the loading, but would the new
index still efficiently support the same queries that the old one did?
I.e. could all queries based on the hash code be reformulated to
query on both exact time stamp and the hash code? Otherwise you would
be throwing the baby out with the bath water.

>
> 3) Does anyone else have similar systems where they have a ton of data
> coming in that they also want to query?  Any tips you can provide or
> alternative designs?  Once the data is in, it will 99.9% of the time
> (100% of the time for some tables) be static.  Part of the issue is that
> the user wants to be able to search based on all sorts of attributes --
> this leads to lots of indexes and more disk/memory usage when writing.

Have you experimentally verified that all of the indexes really are
needed to get acceptable query performance? I tend to error on the
side of adding more indices just in case it might be useful, but you
already know you have a problem caused by index maintenance so
defaulting to not having them until you have proof that it is needed
might be better in that case.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rafael Martinez 2011-12-22 09:00:21 Re: Dramatic change in memory usage with version 9.1
Previous Message Benjamin Johnson 2011-12-22 02:30:44 Re: Guidance Requested - Bulk Inserting + Queries