Re: Improving performance of merging data between tables

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Improving performance of merging data between tables
Date: 2014-12-29 17:00:00
Message-ID: 54A18890.5050307@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/28/2014 3:49 PM, Pawel Veselov wrote:
> Hi.
>
> I was wondering if anybody would have any ideas on how to improve
> certain operations that we are having.
>
<SNIP>
>
> Besides "can somebody please look at this and let me know if I'm doing
> something utterly stupid", here are my questions.
>
> 1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
> function? All I see is that the calls to merge_all() function take long
> time, and the CPU is high while this is going on.
>
>

First, I'll admit I didn't read your entire post.

I can think of a couple methods:

1) try each of the statements in merge_all by hand with an "explain
analyze" in front to see which is slow. Look for things that hit big
tables without an index. Check that fk lookups are indexes.

2) try pg_stat_statements, setting "pg_stat_statements.track = all". see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well.
Mostly I use it on a test box, but once ran it on the live, which was
scary, but worked great.

3) try auto-explain:
http://www.postgresql.org/docs/9.4/static/auto-explain.html

I've never used it, so don't know if it'll show each statement inside a
function. Dumps stuff to the log AFAIK, so you'll have to dig out the
info by hand.

> 2) Is there a better way to merge individual rows, except doing
> UPDATE/INSERT in a loop, and would that be CPU expensive?
>

Not that I know of. I use pretty much the same thing. Soon! we will
have merge/upsert support. Hopefully it'll be fast.

> 3) Is there a better way to merge whole tables? However, note that I
> need to translate primary keys from node main table into the common main
> table, as they are used as foreign keys, hence the loops. I suspect the
> looping is CPU intensive.

Avoiding loops and doing things as sets is the best way. If possible.
The only loop I saw was looping over the merge_xxx tables, which is
probably the only way.

If possible (if you haven't already) you could add and extra column to
your secondary table that you can set as the main table's key.

bulk insert into second;
update second set magic = (select key from main where ... );

Then, maybe, you can do two ops in batch:

update main (where key exists in main)
insert into main (where key not exists in main)

-Andy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andy Colson 2014-12-29 17:03:52 Re: pg_base_backup limit bandwidth possible?
Previous Message David Johnston 2014-12-29 16:52:53 Re: Rollback on include error in psql