Making table reloading easier

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Making table reloading easier
Date: 2016-11-03 00:49:10
Message-ID: CAMsr+YHX7kXZdaOiSihrJ5FXGu-syXmPw6bbOXABt9pnbUPP8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all

A very common operation that users perform is reloading tables.
Sometimes as part of an ETL process. Sometimes as part of a dump and
reload. Sometimes loading data from external DBs, etc.

Right now users have to jump through a bunch of hoops to do this efficiently:

BEGIN;

TRUNCATE TABLE my_table;

SELECT pg_get_indexdef(indexrelid::regclass)
FROM pg_index WHERE indrelid = 'table_name'::regclass;

-- Drop 'em all
DROP INDEX ... ;

COPY my_table FROM 'file';

-- Re-create indexes
CREATE INDEX ...;

COMMIT;

This is pretty clunky.

We already have support for disabling indexes, it's just not exposed
to the user. So the simplest option would seem to be to expose it with
something like:

ALTER TABLE my_table
DISABLE INDEX ALL;

which would take an ACCESS EXCLUSIVE lock then set:

indistready = 'f'
indislive = 'f'
indisvalid = 'f'

on each index, or the named index if the user specifies one particular index.

After loading the table, a REINDEX on the table would rebuild and
re-enable the indexes.

That changes the process to:

BEGIN;
TRUNCATE TABLE my_table;
ALTER TABLE my_table DISABLE INDEX ALL;
COPY ...;
REINDEX TABLE my_table;
COMMIT;

It'd be even better to also add a REINDEX flag to COPY, where it
disables indexes and re-creates them after it finishes. But that could
be done separately.

Thoughts?

I'm not sure I can tackle this in the current dev cycle, but it looks
simple enough that I can't help wondering what obvious thing I'm
missing about why it hasn't been done yet.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-11-03 02:15:13 Re: about missing xml related functionnalities
Previous Message Tomas Vondra 2016-11-02 22:56:37 Re: plan_rows confusion with parallel queries