Re: Improving performance of merging data between tables

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>
Cc: Andy Colson <andy(at)squeakycode(dot)net>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Improving performance of merging data between tables
Date: 2015-01-08 03:49:19
Message-ID: CAK-MWwTt13dkP2sEDxem63AhRMrTJ8kFs1C1iVFpaV=GFSvgBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>
wrote:

>
> PPPS: and the last suggestion, after you finished with the "write all the
>> data into its own tables", then application should perform analyze of these
>> own tables (or you could have weird/inefficient plans during last stage).
>
>
> Any references to back this up? I don't particularly mind doing it, but I
> wonder if analysis can be more expensive the processing. These tables get a
> few hundreds of records inserted/updated, then are entirely processed (with
> expected full scans), and then deleted...
>

If these "own tables" used only in full table selects but never used in
joins - than there should be no issues.
However, once you start join these tables with anything else, you could
have very inefficient/weird plans because the database doesn't know
(without analyze) how many rows you have in these tables.​

>
>
>> PS: your setup look pretty complicated and hard to analyze without seeing
>> all involved table structures, transaction/query flow, and (especially)
>> involved procedures source code.
>>
>
> Sure :) At this point, I've put together the "bulk merge" code as well. I
> can't quite see much of a difference, actually, but it's hard to trust the
> execution times, as on the same amount of data they vary from, say, 0.5s to
> 2s, and the sample data is not stepping on any other locks. In general, I'm
> afraid all those left joins and multiple scans, even over small amount of
> data, is nullifying any positive effect.
>


Now some ideas to check.
The high CPU usage usually isn't related to locking, but related to seq
scan or wrong plans or simple inefficient pl/pgsql code, locked processes
usually doesn't use too much cpu.

1)on the test database perform select pg_stat_reset(); then perform full
round of merges, then check
select * from pg_stat_user_tables where seq_scan>0 order by seq_tup_read;
and if you find a lot of seq_scan and seq_tuple_reads on the particular
table try find where they coming from (it could be reason for high CPU
usage).

2)enable track_functions in postgresql.conf and perform the same sequence
(select pg_stat_reset() + full round of merges
) then check
select * FROM pg_stat_user_functions order by self_time desc;
and check which function using the most time.

3)old/lost prepared transactions can have deadly effect on the database
performance at whole. So check select * from pg_prepared_xact(); and verify
that you don't have a hours (or weeks) old prepared xact lying around.

PS: btw I still don't fully understood relation between the:
"
- merges data into its own node tables (using merge_xxx PL/pgSQL functions)
"
and provided code for the public."merge_all02-9A-46-8B-C1-DD" and
PUBLIC.merge_agrio.
As I see
public."merge_all02-9A-46-8B-C1-DD" calling PUBLIC.merge_agrio, and the
PUBLIC.merge_agrio updates a global table R_AGRIO (but not the "own node
table").

I think the best implementation of such task is asynchronous processing of
this changes via background process. An application only inserts events
into queue table (it lockless process), and some background process read
these data from queue table and merge it into main table (again lockless
because it single thread so no concurrent writes), and then delete the
merged data from queue table.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ <http://www.postgresql-consulting.com/>

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Dearman 2015-01-08 12:02:41 View 'instead of' update row with new object
Previous Message Michael Paquier 2015-01-08 00:19:20 Re: How to exclude building/installing contrib modules on Windows