Re: Recommendation to run vacuum FULL in parallel

From: Perumal Raj <perucinci(at)gmail(dot)com>
To: Kevin Brannen <KBrannen(at)efji(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Recommendation to run vacuum FULL in parallel
Date: 2019-04-11 04:49:42
Message-ID: CALvqh4rATReNKQLmrU0xHe5tpr_EFO5h2-n-w-_QFwYmdGVbbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Kevin for the inputs,

In my Case there are 500+ Tables and biggest chunk 30GB ( Table only) + its
indexes.
So i have created 6 batches and executed in parallel . All my scripts
completed in 2 Hours and my DB size came down from 500GB to 300GB.

Yes i do see CPU spike, But i did whole activity with full apps down time.

Going forward i am going to run vacuum daily basis to maintain the DB size.

Also Table/DB Age came down drastically.

Thanks
Raj

On Thu, Apr 4, 2019 at 12:53 PM Kevin Brannen <KBrannen(at)efji(dot)com> wrote:

> *From:* Perumal Raj <perucinci(at)gmail(dot)com>
>
> So conclude the requirement here , The only way to parallelism is multiple
> script. And no need to do REINDEX exclusively.
>
> Question : Do we need to consider Table dependencies while preparing
> script in order to avoid table locks during vacuum full ?
>
>
>
> We have a small bash script (see below) that get the list of tables and
> their sizes, sorted smallest to largest, and do “vacuum full” one at a time
> because (as someone else pointed out) this is very I/O intensive. That
> order also helps to ensure we finish because some of our installs are at
> the edge of running out of space (an issue we’re dealing with). I probably
> wouldn’t have a problem doing 2 at a time, but we do this in the middle of
> the night when activity is lowest and it only takes 1-2 hours, so we’re
> good with it. It sounds like you have a lot more data though.
>
>
>
> You might also consider putting the data into different tablespaces which
> are spread over multiple disks to help I/O. If you can, use SSD drives,
> they help with speed quite a bit. 😊
>
>
>
> Don’t worry about table dependencies. This is a physical operation, not a
> data operation.
>
>
>
> HTH,
>
> Kevin
>
>
>
> $PGPATH/psql -t -c "
>
> WITH s AS (SELECT nspname || '.' || relname AS TABLE_NAME,
> pg_total_relation_size(c.oid) AS total_bytes
>
> FROM pg_class c
>
> LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
>
> WHERE relkind = 'r' AND nspname NOT IN ( 'pg_catalog',
> 'information_schema' )
>
> ORDER BY 2 )
>
> SELECT table_name FROM s
>
> " |
>
> while read t ; do echo "" ; echo $t; $PGPATH/vacuumdb -w -z -f -t $t ;
> done
>
> ###
> This e-mail transmission, and any documents, files or previous e-mail
> messages attached to it, may contain confidential information. If you are
> not the intended recipient, or a person responsible for delivering it to
> the intended recipient, you are hereby notified that any disclosure,
> distribution, review, copy or use of any of the information contained in or
> attached to this message is STRICTLY PROHIBITED. If you have received this
> transmission in error, please immediately notify us by reply e-mail, and
> destroy the original transmission and its attachments without reading them
> or saving them to disk. Thank you.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2019-04-11 07:01:50 Re: Recommendation to run vacuum FULL in parallel
Previous Message Prakash Ramakrishnan 2019-04-11 04:31:02 Re: os upgrade 7.3 to 7.5 (postgres version 10.5)