Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

From: Stanislaw Pankevich <s(dot)pankevich(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Date: 2012-07-13 07:50:53
Message-ID: CAFXpGYa5oAfZW21u1eQqn1FR8TQOVFUW-HjK1nAECdgD8Vvhng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

If someone is interested with the current strategy, I am using for
this, see this Ruby-based repo
https://github.com/stanislaw/truncate-vs-count for both MySQL and
PostgreSQL.

MySQL: the fastest strategy for cleaning databases is truncation with
following modifications:
1) We check is table is not empty and then truncate.
2) If table is empty, we check if AUTO_INCREMENT was changed. If it
was, we do a truncate.

For MySQL just truncation is much faster than just deletion. The only
case where DELETE wins TRUNCATE is doing it on empty table.
For MySQL truncation with empty checks is much faster than just
multiple truncation.
For MySQL deletion with empty checks is much faster than just DELETE
on each tables.

PostgreSQL: The fastest strategy for cleaning databases is deletion
with the same modifications.

For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple).
For PostgreSQL multiple TRUNCATE doing empty checks before is slightly
faster than just multiple TRUNCATE
For PostgreSQL deletion with empty checks is slightly faster than just
PostgreSQL deletion.

This is from where it began:
https://github.com/bmabey/database_cleaner/issues/126
This is the result code and long discussion:
https://github.com/bmabey/database_cleaner/issues/126

We began collecting users feedback proving my idea with first checking
empty tables is right.

Thanks to all participants, especially those who've suggested trying
DELETE as well as optimizing TRUNCATE.

Stanislaw

On Fri, Jul 6, 2012 at 7:06 PM, Steve Crawford
<scrawford(at)pinpointresearch(dot)com> wrote:
> On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:
>>
>>
>> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
>> fastest way to clean each non-empty table and reset unique identifier column
>> of empty ones ====
>>
>> I wonder, what is the fastest way to accomplish this kind of task in
>> PostgreSQL. I am interested in the fastest solutions ever possible.
>>
> It would help if we really understood your use-case. If you want to fully
> reset your database to a known starting state for test runs, why not just
> have a base database initialized exactly as you wish, say "test_base", then
> just drop your test database and create the new database from your template:
> drop database test;
> create database test template test_base;
>
> This should be very fast but it won't allow you to exclude individual
> tables.
>
> Are you interested in absolute fastest as a mind-game or is there a specific
> use requirement, i.e. how fast is fast enough? This is the basic starting
> point for tuning, hardware selection, etc.
>
> Truncate should be extremely fast but on tables that are as tiny as yours
> the difference may not be visible to an end-user. I just tried a "delete
> from" to empty a 10,000 record table and it took 14 milliseconds so you
> could do your maximum of 100 tables each containing 10-times your max number
> of records in less than two seconds.
>
> Regardless of the method you choose, you need to be sure that nobody is
> accessing the database when you reset it. The drop/create database method
> will, of course, require and enforce that. Truncate requires an exclusive
> lock so it may appear to be very slow if it is waiting to get that lock. And
> even if you don't have locking issues, your reluctance to wrap your reset
> code in transactions means that a client could be updating some table or
> tables whenever the reset script isn't actively working on that same table
> leading to unexplained weird test results.
>
> Cheers,
> Steve
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Eoghan Murray 2012-07-13 14:11:23 Poor performance problem with Materialize, 8.4 -> 9.1 (enable_material)
Previous Message David Kerr 2012-07-13 06:51:22 slow prepare, lots of semop calls.