Re: (Solved) Decreasing performance in table partitioning

From: "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: (Solved) Decreasing performance in table partitioning
Date: 2014-09-17 00:24:15
Message-ID: D83E55F5F4D99B4A9B4C4E259E6227CD01EBFDF5@AUX1EXC02.apac.experian.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

That’s a good idea, thanks for sharing Herouth!

From: Herouth Maoz [mailto:herouth(at)unicell(dot)co(dot)il]
Sent: Tuesday, September 16, 2014 10:22 PM
To: Huang, Suya
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] (Solved) Decreasing performance in table partitioning

Thank you. My solution is based on your suggestion, but instead of creating a new partition table and inserting into it, I create partitions as I originally planned, under the existing table, and insert to them. But without deleting.

I use INSERT INTO... SELECT ONLY, without deleting, and so the data becomes duplicated (each row is both in the main table and in the partition). At the end of the loop, when all partitions are populated and indexed, I use TRUNCATE ONLY on the main table, and drop its indexes. This way, the views are not affected.

So thank you for your suggestion and your help. Now for a comparable-size table, run time was less than 2 hours which is quite acceptable.

On 11/09/2014, at 07:26, Huang, Suya wrote:

The views will go with the table. if you rename table, view definition will be automatically changed accordingly. In your situation, you may need to recreate views or other objects have dependency on that old table.

But functions will remain the same, so as long as your new table has been renamed to the same name as the old table, it should be ok.

Note, it’s tested in my 9.3 environment, not sure how it behaves in older versions…

From: Herouth Maoz [mailto:herouth(at)unicell(dot)co(dot)il]
Sent: Wednesday, September 10, 2014 6:26 PM
To: Huang, Suya
Cc: pgsql-general(at)postgresql(dot)org<mailto:pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] Decreasing performance in table partitioning

Thank you. Sorry I have been away for a few days and couldn't thank you before.

Wouldn't this have an impact if there are things like views or functions based on the old table?

On 08/09/2014, at 04:57, Huang, Suya wrote:

Instead of deleting from the original non-partition table which is not efficient, you can try below approach.

Put below logic in a function as you like:

Create a new partition table.
Insert data from original non-partition table to the correct partition of new partition table.
Build index and analyze as needed.
Rename old non-partition table to something else.
Rename new partition table to the correct name as you wanted.

Drop old non-partition table if you’re satisfied with current table structure.

Thanks,
Suya
From: pgsql-general-owner(at)postgresql(dot)org<mailto:pgsql-general-owner(at)postgresql(dot)org> [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Herouth Maoz
Sent: Monday, September 08, 2014 12:00 AM
To: pgsql-general(at)postgresql(dot)org<mailto:pgsql-general(at)postgresql(dot)org>
Subject: [GENERAL] Decreasing performance in table partitioning

Hello all.

I have created a function that partitions a large table into monthly partitions. Since the name of the table, target schema for partitions, name of the date field etc. are all passed as strings, the function is heavily based on EXECUTE statements.

My problem is the main loop, in which data for one month is moved from the old table to the partition table.

(1)
EXECUTE FORMAT (
'WITH del AS (
DELETE FROM %1$I.%2$I
WHERE %3$I >= %4$L AND %3$I < %5$L
RETURNING *
)
INSERT INTO %6$I.%7$I
SELECT * FROM del',
p_main_schema,
p_table_name,
p_date_field_name,
v_curr_month_str,
v_curr_month_to_str,
p_partition_schema,
v_partition_name
);

In the first few iterations, this runs in very good times. But as iterations progress, performance drops, despite the size of the date for each month being more or less the same. Eventually I end up with iterations that run for hours, when I started with only a few minutes. The odd thing is that the last iteration, which is actually for a month not yet inserted into that table (0 records to move) it took 6 hours for the above statement to run!

I tried to improve this, by first testing whether there are any records for the current month in the table, adding:

(2)

EXECUTE FORMAT (
'SELECT true
FROM %1$I.%2$I
WHERE %3$I >= %4$L AND %3$I < %5$L
LIMIT 1',
p_main_schema,
p_table_name,
p_date_field_name,
v_curr_month_str,
v_curr_month_to_str
) INTO v_exists;

Before the above statement, and putting it in an IF statement on v_exists. Also, after each move, I added:

EXECUTE FORMAT ( 'ANALYZE %I.%I', p_main_schema, p_table_name );

But to no avail. In fact, in each iteration, the execution of statement 2 above takes more and more time.

Here is the number of rows in each month for the table I was trying to partition:

count | the_month
----------+---------------------
10907117 | 2013-08-01 00:00:00
12715234 | 2013-09-01 00:00:00
14902928 | 2013-10-01 00:00:00
10933566 | 2013-11-01 00:00:00
11394906 | 2013-12-01 00:00:00
9181051 | 2014-01-01 00:00:00
8487028 | 2014-02-01 00:00:00
9892981 | 2014-03-01 00:00:00
8830191 | 2014-04-01 00:00:00
8368638 | 2014-05-01 00:00:00
8014685 | 2014-06-01 00:00:00
6780589 | 2014-07-01 00:00:00

And the times for each iteration:

Month Statement 2 Statement 1
2013-08 3 sec 3 min
2013-09 2 min 17 min
2013-10 4 min 21 min
2013-11 8 min 20 min
2013-12 9 min 32 min
2014-01 16 min 21 min
2014-02 19 min 20 min
2014-03 14 min 23 min

For April I had to cancel it in the middle. My problem is that I can't let this run into the evening, when we have backup followed by large data collection. These times are just for the given statements, and additional time is spent creating indexes on the partitions and so on. So this thing ran from 11:24 until I had to cancel it at around 6PM.

Can anybody explain the performance deterioration and/or offer a suggestion for a different design?

TIA,
Herouth

--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742

--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Van Olmen 2014-09-17 01:16:42 Postgresql out of memory during big transaction
Previous Message Michael Paquier 2014-09-16 21:53:34 Re: orphan records in pg_class