Guidance Requested - Bulk Inserting + Queries

From: Benjamin Johnson <benjamin(dot)johnson(at)getcarbonblack(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Guidance Requested - Bulk Inserting + Queries
Date: 2011-11-30 15:27:35
Message-ID: 4ED64B67.70401@getcarbonblack.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Experts,

Quick Summary: data can now be inserted very quickly via COPY + removing
indexes, but is there a design or some tricks to still allow someone to
query while the partition is still active and 'hot' ?

- Postgres 9.1
- Windows 7 (64-bit) , although this is just for the current test and
could vary depending on situation
- We have 4 main tables with daily partitions
- Each table/partition has multiple indexes on it
- Streaming logs from client machines into our server app which
processes the logs and tries to shove all that data into these daily
partitions as fast as it can.
- Using COPY and removed original primary key unique constraints to try
to get it to be as fast as possible (some duplicates are possible)
- Will remove duplicates in a later step (disregard for this post)

We now found (thanks Andres and Snow-Man in #postgresql) that in our
tests, after the indexes get too large performance drops signficantly
and our system limps forward due to disk reads (presumably for the
indexes). If we remove the indexes, performance for our entire sample
test is great and everything is written to postgresql very quickly.
This allows us to shove lots and lots of data in (for production
possibly 100 GB or a TB per day!)

My question is, what possible routes can I take where we can have both
fast inserts (with indexes removed until the end of the day), but still
allow a user to query against today's data? Is this even possible? One
idea would be possibly have hourly tables for today and as soon as we
can try to re-add indexes. Another possible solution might be to stream
the data to another "reader" postgres instance that has indexes,
although I'm not very versed in replication.

Any ideas would be greatly appreciated.

Thanks!

Ben

--
Benjamin Johnson
http://getcarbonblack.com/ | @getcarbonblack
cell: 312.933.3612 | @chicagoben

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Leonardo Francalanci 2011-11-30 16:17:33 Re: Guidance Requested - Bulk Inserting + Queries
Previous Message MirrorX 2011-11-29 17:12:13 vacuum internals and performance affect