Re: Guidance Requested - Bulk Inserting + Queries

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Benjamin Johnson <benjamin(dot)johnson(at)getcarbonblack(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Guidance Requested - Bulk Inserting + Queries
Date: 2011-12-01 15:06:42
Message-ID: CAMkU=1y0n1fShoW8CMgV+oPeMMM43W2vNirF3GrH_J2=3raLwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Nov 30, 2011 at 7:27 AM, Benjamin Johnson
<benjamin(dot)johnson(at)getcarbonblack(dot)com> wrote:
> Experts,
>
> Quick Summary: data can now be inserted very quickly via COPY + removing
> indexes, but is there a design or some tricks to still allow someone to
> query while the partition is still active and 'hot' ?
>
> - Postgres 9.1
> - Windows 7 (64-bit) , although this is just for the current test and
> could vary depending on situation
> - We have 4 main tables with daily partitions

How long are the daily partitions kept for?

> - Each table/partition has multiple indexes on it
> - Streaming logs from client machines into our server app which
> processes the logs and tries to shove all that data into these daily
> partitions as fast as it can.

Why shove it in as fast as you can? If you want to both read and
write at the same time, then focusing first only on writing and
worrying about reading as an after thought seems like the wrong thing
to do.

> - Using COPY and removed original primary key unique constraints to try
> to get it to be as fast as possible (some duplicates are possible)
> - Will remove duplicates in a later step (disregard for this post)
>
> We now found (thanks Andres and Snow-Man in #postgresql) that in our
> tests, after the indexes get too large performance drops signficantly
> and our system limps forward due to  disk reads (presumably for the
> indexes).

How many hours worth of data can be loaded into the new partition
before the performance knee hits?

After the knee, how does the random disk read activity you see compare
to the maximum random disk reads your IO system can support? How many
COPYs were you doing at the same time?

During this test, was there background select activity going on, or
was the system only used for COPY?

> If we remove the indexes, performance for our entire sample
> test is great and everything is written to postgresql very quickly.
> This allows us to shove lots and lots of data in (for production
> possibly 100 GB or a TB per day!)

How much do you need to shove in per day? If you need to insert it,
and index it, and run queries, and deal with maintenance of the older
partitions, then you will need a lot of spare capacity, relative to
just inserting, to do all of those things. Do you have windows where
there is less insert activity in which other things can get done?

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2011-12-01 18:01:57 Re: vacuum internals and performance affect
Previous Message Jesper Krogh 2011-12-01 06:11:40 Re: Problems with FTS