Re: Disabling Heap-Only Tuples

From: Thom Brown <thom(at)linux(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disabling Heap-Only Tuples
Date: 2023-07-07 11:21:03
Message-ID: CAA-aLv5yos961mAjfFRrmy6ZKr=eWMRN4FeSBeya9XcettrHcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 6 Jul 2023 at 21:18, Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Wed, 5 Jul 2023 at 19:55, Thom Brown <thom(at)linux(dot)com> wrote:
> >
> > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
> > <boekewurm+postgres(at)gmail(dot)com> wrote:
> > > So what were you thinking of? A session GUC? A table option?
> >
> > Both.
>
> Here's a small patch implementing a new table option max_local_update
> (name very much bikesheddable). Value is -1 (default, disabled) or the
> size of the table in MiB that you still want to allow to update on the
> same page. I didn't yet go for a GUC as I think that has too little
> control on the impact on the system.
>
> I decided that max_local_update would be in MB because there is no
> reloption value that can contain MaxBlockNumber and -1/disabled; and 1
> MiB seems like enough granularity for essentially all use cases.
>
> The added regression tests show how this feature works, that the new
> feature works, and validate that lock levels are acceptable
> (ShareUpdateExclusiveLock, same as for updating fillfactor).

Wow, thanks for working on this.

I've given it a test, and it does what I would expect it to do.

I'm aware of the concerns about the potential for the relocation to
land in an undesirable location, so perhaps that needs addressing.
But this is already considerably better than the current need to
update a row until it gets pushed off its current page. Ideally there
would be tooling built around this where the user wouldn't need to
figure out how much of the table to UPDATE, or deal with VACUUMing
concerns.

But here's my quick test:

CREATE OR REPLACE FUNCTION compact_table(table_name IN TEXT)
RETURNS VOID AS $$
DECLARE
current_row RECORD;
old_ctid TID;
new_ctid TID;
keys TEXT;
update_query TEXT;
row_counter INTEGER := 0;
BEGIN
SELECT string_agg(a.attname || ' = ' || a.attname, ', ')
INTO keys
FROM
pg_index i
JOIN
pg_attribute a ON a.attnum = ANY(i.indkey)
WHERE
i.indrelid = table_name::regclass
AND a.attrelid = table_name::regclass
AND i.indisprimary;

IF keys IS NULL THEN
RAISE EXCEPTION 'Table % does not have a primary key.', table_name;
END IF;

FOR current_row IN
EXECUTE FORMAT('SELECT ctid, * FROM %I ORDER BY ctid DESC', table_name)
LOOP
old_ctid := current_row.ctid;

update_query := FORMAT('UPDATE %I SET %s WHERE ctid = $1
RETURNING ctid', table_name, keys);
EXECUTE update_query USING old_ctid INTO new_ctid;

row_counter := row_counter + 1;

IF row_counter % 1000 = 0 THEN
RAISE NOTICE '% rows relocated.', row_counter;
END IF;

IF new_ctid <= old_ctid THEN
CONTINUE;
ELSE
RAISE NOTICE 'All non-contiguous rows relocated.';
EXIT;
END IF;
END LOOP;
END; $$
LANGUAGE plpgsql;

postgres=# CREATE TABLE bigtable (id int, content text);
CREATE TABLE
postgres=# INSERT INTO bigtable SELECT x, 'This is just a way to fill
up space.' FROM generate_series(1,10000000) a(x);
INSERT 0 10000000
postgres=# DELETE FROM bigtable WHERE id % 7 = 0;
DELETE 1428571
postgres=# VACUUM bigtable;
VACUUM
postgres=# ALTER TABLE bigtable SET (max_local_update = 0);
ALTER TABLE
postgres=# ALTER TABLE bigtable ADD PRIMARY KEY (id);
ALTER TABLE
postgres=# \dt+ bigtable
List of relations
Schema | Name | Type | Owner | Persistence | Access method |
Size | Description
--------+----------+-------+-------+-------------+---------------+--------+-------------
public | bigtable | table | thom | permanent | heap | 730 MB |
(1 row)

postgres=# SELECT * FROM pgstattuple('bigtable');
table_len | tuple_count | tuple_len | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
765607936 | 8571429 | 557142885 | 72.77 |
0 | 0 | 0 | 105901628 | 13.83
(1 row)

postgres=# SELECT compact_table('bigtable');
NOTICE: 1000 rows relocated.
NOTICE: 2000 rows relocated.
NOTICE: 3000 rows relocated.
NOTICE: 4000 rows relocated.
...
NOTICE: 1221000 rows relocated.
NOTICE: 1222000 rows relocated.
NOTICE: 1223000 rows relocated.
NOTICE: 1224000 rows relocated.
NOTICE: All non-contiguous rows relocated.
compact_table
---------------

(1 row)

postgres=# VACUUM bigtable;
VACUUM
postgres=# \dt+ bigtable;
List of relations
Schema | Name | Type | Owner | Persistence | Access method |
Size | Description
--------+----------+-------+-------+-------------+---------------+--------+-------------
public | bigtable | table | thom | permanent | heap | 626 MB |
(1 row)

postgres=# SELECT * FROM pgstattuple('bigtable');
table_len | tuple_count | tuple_len | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
656236544 | 8571429 | 557142885 | 84.9 |
0 | 0 | 0 | 2564888 | 0.39
(1 row)

Works for me.

Thom

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2023-07-07 11:23:47 Re: Add hint message for check_log_destination()
Previous Message Laurenz Albe 2023-07-07 11:20:41 Re: [PATCH] Add support function for containment operators