Re: [HACKERS] Block level parallel vacuum

From: Mahendra Singh <mahi6run(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-12-09 19:00:27
Message-ID: CAKYtNArYEO3CphnZyamkamRMTZW6YQ4_ZgrGJX-rRjt14-ceow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 6 Dec 2019 at 10:50, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > I think it might be a good idea to change what we expect index AMs to
> > do rather than trying to make anything that they happen to be doing
> > right now work, no matter how crazy. In particular, suppose we say
> > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > more, and that instead the extra data is passed through a separate
> > parameter. And then you add an estimate method that gives the size of
> > the space provided by that parameter (and if the estimate method isn't
> > defined then the extra parameter is passed as NULL) and document that
> > the data stored there might get flat-copied.
> >
>
> I think this is a good idea and serves the purpose we are trying to
> achieve currently. However, if there are any IndexAM that is using
> the current way to pass stats with additional information, they would
> need to change even if they don't want to use parallel vacuum
> functionality (say because their indexes are too small or whatever
> other reasons). I think this is a reasonable trade-off and the
> changes on their end won't be that big. So, we should do this.
>
> > Now, you've taken the
> > onus off of parallel vacuum to cope with any crazy thing a
> > hypothetical AM might be doing, and instead you've defined the
> > behavior of that hypothetical AM as wrong. If somebody really needs
> > that, it's now their job to modify the index AM machinery further
> > instead of your job to somehow cope.
> >
>
> makes sense.
>
> > > Here, we have a need to reduce the number of workers. Index Vacuum
> > > has two different phases (index vacuum and index cleanup) which uses
> > > the same parallel-context/DSM but both could have different
> > > requirements for workers. The second phase (cleanup) would normally
> > > need fewer workers as if the work is done in the first phase, second
> > > wouldn't need it, but we have exceptions like gin indexes where we
> > > need it for the second phase as well because it takes the pass
> > > over-index again even if we have cleaned the index in the first phase.
> > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > for index cleanup phase. There are other cases too.
> > >
> > > We also considered to have a separate DSM for each phase, but that
> > > appeared to have overhead without much benefit.
> >
> > How about adding an additional argument to ReinitializeParallelDSM()
> > that allows the number of workers to be reduced? That seems like it
> > would be less confusing than what you have now, and would involve
> > modify code in a lot fewer places.
> >
>
> Yeah, we can do that. We can maintain some information in
> LVParallelState which indicates whether we need to reinitialize the
> DSM before launching workers. Sawada-San, do you see any problem with
> this idea?
>
>
> > > > Is there any legitimate use case for parallel vacuum in combination
> > > > with vacuum cost delay?
> > > >
> > >
> > > Yeah, we also initially thought that it is not legitimate to use a
> > > parallel vacuum with a cost delay. But to get a wider view, we
> > > started a separate thread [2] and there we reach to the conclusion
> > > that we need a solution for throttling [3].
> >
> > OK, thanks for the pointer. This doesn't address the other part of my
> > complaint, though, which is that the whole discussion between you and
> > Dilip and Sawada-san presumes that you want the delays ought to be
> > scattered across the workers roughly in proportion to their share of
> > the I/O, and it seems NOT AT ALL clear that this is actually a
> > desirable property. You're all assuming that, but none of you has
> > justified it, and I think the opposite might be true in some cases.
> >
>
> IIUC, your complaint is that in some cases, even if the I/O rate is
> enough for one worker, we will still launch more workers and throttle
> them. The point is we can't know in advance how much I/O is required
> for each index. We can try to do that based on index size, but I
> don't think that will be right because it is possible that for the
> bigger index, we don't need to dirty the pages and most of the pages
> are in shared buffers, etc. The current algorithm won't use more I/O
> than required and it will be good for cases where one or some of the
> indexes are doing more I/O as compared to others and it will also work
> equally good even when the indexes have a similar amount of work. I
> think we could do better if we can predict how much I/O each index
> requires before actually scanning the index.
>
> I agree with the other points (add a FAST option for parallel vacuum
> and document that parallel vacuum is still potentially throttled ...)
> you made in a separate email.
>
>
> > You're adding extra complexity for something that isn't a clear
> > improvement.
> >
> > > Your understanding is correct. How about if we modify it to something
> > > like: "Note that parallel workers are alive only during index vacuum
> > > or index cleanup but the leader process neither exits from the
> > > parallel mode nor destroys the parallel context until the entire
> > > parallel operation is finished." OR something like "The leader backend
> > > holds the parallel context till the index vacuum and cleanup is
> > > finished. Both index vacuum and cleanup separately perform the work
> > > with parallel workers."
> >
> > How about if you just delete it? You don't need a comment explaining
> > that this caller of CreateParallelContext() does something which
> > *every* caller of CreateParallelContext() must do. If you didn't do
> > that, you'd fail assertions and everything would break, so *of course*
> > you are doing it.
> >
>
> Fair enough, we can just remove this part of the comment.
>

Hi All,
Below is the brief about testing of v35 patch set.

1.
All the test cases are passing on the top of v35 patch set (make check
world and all contrib test cases)

2.
By enabling PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION, "make check
world" is passing.

3.
After v35 patch, vacuum.sql regression test is taking too much time due to
large number of inserts so by reducing number of tuples, we can reduce that
time.
+INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM
generate_series(1,100000) i;

here, instead of 100000, we can make 1000 to reduce time of this test case
because we only want to test code and functionality.

4.
I tested functionality of parallel vacuum with different server
configuration setting and behavior is as per expected.
*shared_buffers, max_parallel_workers, max_parallel_maintenance_workers,
vacuum_cost_limit, vacuum_cost_delay, maintenance_work_mem,
max_worker_processes*

*5. *
index and table stats of parallel vacuum are matching with normal vacuum.

postgres=# select * from pg_statio_all_tables where relname = 'test';
relid | schemaname | relname | heap_blks_read | heap_blks_hit |
idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
16384 | public | test | 399 | 5000 | 3 | 0 | 0 | 0 | 0 | 0
(1 row)

6.
vacuum Progress Reporting is as per expectation.
postgres=# select * from pg_stat_progress_vacuum;
pid | datid | datname | relid | phase | heap_blks_total |
heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count |
max_dead_tuples | num_dead_tuples
-------+-------+----------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
44161 | 13577 | postgres | 16384 | cleaning up indexes | 41650 |
41650 | 41650 | 1 |
11184810 | 1000000
(1 row)

*7.*
If any worker(or main worker) got error, then immediately all the workers
are exiting and action is marked as abort.

8.
I tested parallel vacuum for all the types of indexes and by varying size
of indexes, all are working and didn't got any unexpected behavior.

9.
While doing testing, I found that if we delete all the tuples from table,
then also size of btree indexes was not reducing.

delete all tuples from table.
before vacuum, total pages in btree index: 8000
after vacuum(normal/parallel), total pages in btree index: 8000
but size of table is reducing after deleting all the tuples.
Can we add a check in vacuum to truncate all the pages of btree indexes if
there is no tuple in table.

Please let me know if you have any inputs for more testing.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-12-09 19:02:08 Re: VACUUM memory management
Previous Message Alvaro Herrera 2019-12-09 18:54:28 Re: VACUUM memory management