Re: Vacuum: allow usage of more than 1GB of work mem

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-09-14 09:45:32
Message-ID: CABOikdNVfVkY8hy4A94Kzkeu8-MQ4UFB7WXc8n+TgA876=dVWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Sep 14, 2016 at 8:47 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
wrote:

>
>>
> Sawada-san offered to reimplement the patch based on what I proposed
> upthread. In the new scheme of things, we will allocate a fixed size bitmap
> of length 2D bits per page where D is average page density of live + dead
> tuples. (The rational behind multiplying D by a factor of 2 is to consider
> worst case scenario where every tuple also has a LP_DIRECT line
> pointer). The value of D in most real world, large tables should not go
> much beyond, say 100, assuming 80 bytes wide tuple and 8K blocksize. That
> translates to about 25 bytes/page. So all TIDs with offset less than 2D can
> be represented by a single bit. We augment this with an overflow to track
> tuples which fall outside this limit. I believe this area will be small,
> say 10% of the total allocation.
>
>
So I cooked up the attached patch to track number of live/dead tuples found
at offset 1 to MaxOffsetNumber. The idea was to see how many tuples
actually go beyond the threshold of 2D offsets per page. Note that I am
proposing to track 2D offsets via bitmap and rest via regular TID array.

So I ran a pgbench test for 2hrs with scale factor 500. autovacuum scale
factor was set to 0.1 or 10%.

Some interesting bits:

postgres=# select relname, n_tup_ins, n_tup_upd, n_tup_hot_upd, n_live_tup,
n_dead_tup, pg_relation_size(relid)/8192 as relsize,
(n_live_tup+n_dead_tup)/(pg_relation_size(relid)/8192) as density from
pg_stat_user_tables ;
relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_live_tup |
n_dead_tup | relsize | density
------------------+-----------+-----------+---------------+------------+------------+---------+---------
pgbench_tellers | 5000 | 95860289 | 87701578 | 5000 |
0 | 3493 | 1
pgbench_branches | 500 | 95860289 | 94158081 | 967 |
0 | 1544 | 0
pgbench_accounts | 50000000 | 95860289 | 93062567 | 51911657 |
3617465 | 865635 | 64
pgbench_history | 95860289 | 0 | 0 | 95258548 |
0 | 610598 | 156
(4 rows)

Smaller tellers and branches tables bloat so much that the density as
computed by live + dead tuples falls close to 1 tuple/page. So for such
tables, the idea of 2D bits/page will fail miserably. But I think these
tables are worst representatives and I would be extremely surprised if we
ever find very large table bloated so much. But even then, this probably
tells us that we can't solely rely on the density measure.

Another interesting bit about these small tables is that the largest used
offset for these tables never went beyond 291 which is the value of
MaxHeapTuplesPerPage. I don't know if there is something that prevents
inserting more than MaxHeapTuplesPerPage offsets per heap page and I don't
know at this point if this gives us upper limit for bits per page (may be
it does).

For pgbench_accounts table, the maximum offset used was 121, though bulk of
the used offsets were at the start of the page (see attached graph). Now
the test did not create enough dead tuples to trigger autovacuum on
pgbench_accounts table. So I ran a manul vacuum at the end. (There are
about 5% dead tuples in the table by the time test finished)

postgres=# VACUUM VERBOSE pgbench_accounts ;
INFO: vacuuming "public.pgbench_accounts"
INFO: scanned index "pgbench_accounts_pkey" to remove 2797722 row versions
DETAIL: CPU 0.00s/9.39u sec elapsed 9.39 sec.
INFO: "pgbench_accounts": removed 2797722 row versions in 865399 pages
DETAIL: CPU 0.10s/7.01u sec elapsed 7.11 sec.
INFO: index "pgbench_accounts_pkey" now contains 50000000 row versions in
137099 pages
DETAIL: 2797722 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pgbench_accounts": found 852487 removable, 50000000 nonremovable
row versions in 865635 out of 865635 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 802256 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.73s/27.20u sec elapsed 27.98 sec.tuple count at each offset
(offnum:all_tuples:dead_tuples)

For 2797722 dead line pointers, the current representation would have used
2797722 x 6 = 16786332 bytes of memory. The most optimal bitmap would have
used 121 bits/page x 865399 pages = 13089159 bytes where as if we had
provisioned 2D bits/page and assuming D = 64 based on the above
calculation, we would have used 13846384 bytes of memory. This is about 18%
less than the current representation. Of course, we would have allocated
some space for overflow region, which will make the difference
smaller/negligible. But the bitmaps would be extremely cheap to lookup
during index scans.

Now may be I got lucky, may be I did nor run tests long enough (though I
believe that may have worked in favour of bitmap), may be mostly HOT
updated tables are not good candidate for testing and may be there are
situations where the proposed bitmap representation will fail badly. But
these tests show that the idea is at least worth considering and we can
improve things for at least some workload. The question is can be avoid
regression in not-so-good cases.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
vacuum_print_offset_stats.patch application/octet-stream 3.2 KB
image/png 11.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-09-14 10:13:03 Printing bitmap objects in the debugger
Previous Message Thomas Berger 2016-09-14 09:22:22 Re: [BUGS] BUG #14244: wrong suffix for pg_size_pretty()