Re: concurrent inserts into two separate tables are very slow

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Sergei Shelukhin" <realgeek(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: concurrent inserts into two separate tables are very slow
Date: 2008-01-07 23:08:46
Message-ID: dcc563d10801071508n7297eeeaw121ff16d3dc262cd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Jan 7, 2008 4:49 PM, Sergei Shelukhin <realgeek(at)gmail(dot)com> wrote:
>
> Scott Marlowe wrote:
> > On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek(at)gmail(dot)com> wrote:
> >
> >> Hi. Running postgres 8.2 on debian.
> >> I've noticed that concurrent inserts (archiving) of large batches data
> >> into two completely unrelated tables are many times slower than the
> >> same inserts done in sequence.
> >> Is there any way to speed them up apart from buying faster HDs/
> >> changing RAID configuration?
> >>
> >
> > What method are you using to load these data? Got a short example
> > that illustrates what you're doing?
> >
> >
> The basic structure is as follows: there are several tables with
> transaction data that is stored for one month only.
> The data comes from several sources in different formats and is pushed
> in using a custom script.
> It gets the source data and puts it into a table it creates (import
> table) with the same schema as the main table; then it deletes the month
> old data from the main table; it also searches for duplicates in the
> main table using some specific criteria and deletes them too (to make
> use of indexes 2nd temp table is created with id int column and it's
> populated with one insert ... select query with the transaction ids of
> data duplicate in main and import tables, after that delete from pages
> where id in (select id from 2nd-temp-table) is called). Then it inserts
> the remainder of the imports table into the main table.
> There are several data load processes that function in the same manner
> with different target tables.
> When they are running in sequence, they take about 20 minutes to
> complete on average. If, however, they are running in parallel, they can
> take up to 3 hours... I was wondering if it's solely the HD bottleneck
> case, given that there's plenty of CPU and RAM available and postgres is
> configured to use it.

Ahh, thanks for the more detailed explanation. Now I get what you're facing.

There are a few things you could do that would probably help. Doing
more than one might help.

1: Buy a decent battery backed caching RAID controller. This will
smooth out writes a lot. If you can't afford that...
2: Build a nice big RAID-10 array, say 8 to 14 discs.
3: Put pg_xlog on a physically separate drive from the rest of the database.
4: Put each table being inserted to on a separate physical hard drives.
5: Stop writing to multiple tables at once.
6: (Not recommended) run with fsync turned off.

Each of these things can help on their own. My personal preference
for heavily written databases is a good RAID controller with battery
backed caching on and a lot of discs in RAID-10 or RAID-6 (depending
on read versus write ratio and the need for storage space.) RAID-10
is normally better for performance, RAID-6 with large arrays is better
for maximizing your size while maintaining decent performance and
reliability. RAID-5 is right out.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-01-08 00:01:32 Re: Linux/PostgreSQL scalability issue - problem with 8 cores
Previous Message Sergei Shelukhin 2008-01-07 22:49:42 Re: concurrent inserts into two separate tables are very slow