Re: Performance while loading data and indexing

From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance while loading data and indexing
Date: 2002-09-26 09:46:50
Message-ID: 3D9324E2.30195.137BF348@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-performance

On 26 Sep 2002 at 19:05, Martijn van Oosterhout wrote:

> On Thu, Sep 26, 2002 at 02:05:44PM +0530, Shridhar Daithankar wrote:
> > 1) Database load time from flat file using copy is very high
> > 2) Creating index takes huge amount of time.
> > 3) Any suggsestions for runtime as data load and query will be going in
> > parallel.
>
> You're loading all the data in one copy. I find that INSERTs are mostly
> limited by indexes. While index lookups are cheap, they are not free and
> each index needs to be updated for each row.
>
> I fond using partial indexes to only index the rows you actually use can
> help with the loading. It's a bit obscure though.
>
> As for parallel loading, you'll be limited mostly by your I/O bandwidth.
> Have you measured it to take sure it's up to speed?

Well. It's like this, as of now.. CreateDB->create table->create index->Select.

So loading is not slowed by index. As of your hint of vmstat, will check it
out.
> So you're loading at a rate of 860KB per sec. That's not too fast. How many
> indexes are active at that time? Triggers and foreign keys also take their
> toll.

Nothing except the table where data os loaded..

> fsync IIRC only affects the WAL buffers now but it may be quite expensive,
> especially considering it's running on every transaction commit. Oh, your
> WAL files are on a seperate disk from the data?

No. Same RAID 5 disks..

> It shouldn't. Do you have an idea of what your CPU usage is? ps aux should
> give you a decent idea.

I guess we forgot to monitor system parameters. Next on my list is running
vmstat, top and tuning bdflush.

> Find the bottleneck: CPU, I/O or memory?

Understood..
>
> > Mysql is almost out because it's creating index for last 17 hours. I don't
> > think it will keep up with 5K inserts per sec. with index. SAP DB is under
> > evaluation too. But postgresql is most favourite as of now because it works. So
> > I need to come up with solutions to problems that will occur in near future..
> > ;-)
>
> 17 hours! Ouch. Either way, you should be able to do much better. Hope this
> helps,

Heh.. no wonder this evaluation is taking more than 2 weeks.. Mysql was running
out of disk space while creating index and crashin. An upgrade to mysql helped
there but no numbers as yet..

Thanks once again...
Bye
Shridhar

--
Boren's Laws: (1) When in charge, ponder. (2) When in trouble, delegate. (3)
When in doubt, mumble.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2002-09-26 09:48:06 Re: Performance while loading data and indexing
Previous Message Shridhar Daithankar 2002-09-26 09:35:40 Re: Performance while loading data and indexing

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2002-09-26 09:48:06 Re: Performance while loading data and indexing
Previous Message Shridhar Daithankar 2002-09-26 09:35:40 Re: Performance while loading data and indexing

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2002-09-26 09:48:06 Re: Performance while loading data and indexing
Previous Message Shridhar Daithankar 2002-09-26 09:35:40 Re: Performance while loading data and indexing