Quick Links

Re: concurrent inserts into two separate tables are very slow

From:	"Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To:	"Sergei Shelukhin" <realgeek(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: concurrent inserts into two separate tables are very slow
Date:	2008-01-07 23:08:46
Message-ID:	dcc563d10801071508n7297eeeaw121ff16d3dc262cd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Jan 7, 2008 4:49 PM, Sergei Shelukhin <realgeek(at)gmail(dot)com> wrote:
>
> Scott Marlowe wrote:
> > On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek(at)gmail(dot)com> wrote:
> >
> >> Hi. Running postgres 8.2 on debian.
> >> I've noticed that concurrent inserts (archiving) of large batches data
> >> into two completely unrelated tables are many times slower than the
> >> same inserts done in sequence.
> >> Is there any way to speed them up apart from buying faster HDs/
> >> changing RAID configuration?
> >>
> >
> > What method are you using to load these data? Got a short example
> > that illustrates what you're doing?
> >
> >
> The basic structure is as follows: there are several tables with
> transaction data that is stored for one month only.
> The data comes from several sources in different formats and is pushed
> in using a custom script.
> It gets the source data and puts it into a table it creates (import
> table) with the same schema as the main table; then it deletes the month
> old data from the main table; it also searches for duplicates in the
> main table using some specific criteria and deletes them too (to make
> use of indexes 2nd temp table is created with id int column and it's
> populated with one insert ... select query with the transaction ids of
> data duplicate in main and import tables, after that delete from pages
> where id in (select id from 2nd-temp-table) is called). Then it inserts
> the remainder of the imports table into the main table.
> There are several data load processes that function in the same manner
> with different target tables.
> When they are running in sequence, they take about 20 minutes to
> complete on average. If, however, they are running in parallel, they can
> take up to 3 hours... I was wondering if it's solely the HD bottleneck
> case, given that there's plenty of CPU and RAM available and postgres is
> configured to use it.

Ahh, thanks for the more detailed explanation. Now I get what you're facing.

There are a few things you could do that would probably help. Doing
more than one might help.

1: Buy a decent battery backed caching RAID controller. This will
smooth out writes a lot. If you can't afford that...
2: Build a nice big RAID-10 array, say 8 to 14 discs.
3: Put pg_xlog on a physically separate drive from the rest of the database.
4: Put each table being inserted to on a separate physical hard drives.
5: Stop writing to multiple tables at once.
6: (Not recommended) run with fsync turned off.

Each of these things can help on their own. My personal preference
for heavily written databases is a good RAID controller with battery
backed caching on and a lot of discs in RAID-10 or RAID-6 (depending
on read versus write ratio and the need for storage space.) RAID-10
is normally better for performance, RAID-6 with large arrays is better
for maximizing your size while maintaining decent performance and
reliability. RAID-5 is right out.

In response to

Re: concurrent inserts into two separate tables are very slow at 2008-01-07 22:49:42 from Sergei Shelukhin

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2008-01-08 00:01:32	Re: Linux/PostgreSQL scalability issue - problem with 8 cores
Previous Message	Sergei Shelukhin	2008-01-07 22:49:42	Re: concurrent inserts into two separate tables are very slow