Re: Tuning resource parameters for a logging database.

From: Alex Thurlow <alex(at)blastro(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Tuning resource parameters for a logging database.
Date: 2009-05-21 19:39:05
Message-ID: 4A15ADD9.6040600@blastro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I should also specify how my inserts are happening I guess. I'm
actually doing the logs to flat files and then inserting them into the
database on an hourly basis using COPY, so I don't need to worry as much
about the log insert speed as I do the reporting.

On 5/21/2009 2:36 PM, Scott Marlowe wrote:
> On Thu, May 21, 2009 at 1:13 PM, Alex Thurlow<alex(at)blastro(dot)com> wrote:
>
>> I have a postgresql database that I'm using for logging of data. There's
>> basically one table where each row is a line from my log files. It's
>> getting to a size where it's running very slow though. There are about 10
>> million log lines per day and I keep 30 days of data in it. All the columns
>> I filter on are indexed (mostly I just use date).
>>
>
> **DING DING DING** you've just said the magic phrase that says that
> partitioning would be a help.
>
>
>> And I tend to pull one
>> day of data at a time with grouped counts by 1 or 2 other columns. There
>> also tends to be only 1 or 2 of these large queries running at any given
>> time, so a lot of resources can be thrown at each one.
>>
>> I'm wondering what my resource parameters should be for optimal speed of the
>> selects on this database, since I haven't seen a good example where someone
>> has done anything like this.
>>
>
> With a logging database you're optimizing two often opposing actions.
> Lots of small inserts in a stream that HAVE to get processed and put
> in efficiently. This is often accomplished with minimum
> shared_buffers and work_mem, because there's no need for the overhead
> of large shared_buffers and insert queries for logging dbs don't need
> much work_mem.
>
> With a reporting database you run queries that chew up tons of memory
> both shared_buffers and work_mem for efficient operation.
>
>
>> The machine is an 8 core opteron (I know I won't really use those, but Dell
>> threw in the 2nd proc for free) with 8 Gb RAM. The database is on a RAID 10
>> JFS partition.
>>
>
> Yeah CPUs are cheap, might as well stock up on them. A reporting
> database can quickly go cpu bound if everything the users want to see
> fits in memory.
>
>
>> This is what I have in postgresql.conf right now..
>>
>> shared_buffers = 64MB
>>
>
> Small for reporting, just right for logging. I'd try something bigger
> but not insanely huge. Let the OS do the caching of 90% of the data,
> let the db cache a good sized working set. 256M to 1G is reasonable
> based on benchmarks of your own queries.
>
>
>> work_mem = 128MB
>>
>
> Bigger than needed for logging, good for reporting. You can probably
> just leave it.
>
>
>> maintenance_work_mem = 256MB
>> max_fsm_pages = 614400
>>
>
> If you're not partitioning then this needs to be big enough to contain
> 1 days+ worth of dead rows.
>
> Look at lowering your random_page_cost, and increasing default stats
> target to 100 to 1000 depending on your data and explain analyze query
> testing.
>
>

--
Alex Thurlow
Blastro Networks

http://www.blastro.com
http://www.roxwel.com
http://www.yallwire.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2009-05-21 19:39:19 Re: Tuning resource parameters for a logging database.
Previous Message Alex Thurlow 2009-05-21 19:37:00 Re: Tuning resource parameters for a logging database.