Replication origins are intended to make it easier to implement logical replication solutions on top of logical decoding. They provide a solution to two common problems:
How to safely keep track of replication progress
How to change replication behavior based on the origin of a row; for example, to prevent loops in bi-directional replication setups
Replication origins have just two properties, a name and an
OID. The name, which is what should be used to refer to the
origin across systems, is free-form
text. It should be used in a way that makes
conflicts between replication origins created by different
replication solutions unlikely; e.g. by prefixing the replication
solution's name to it. The OID is used only to avoid having to
store the long version in situations where space efficiency is
important. It should never be shared across systems.
One nontrivial part of building a replication solution is to keep track of replay progress in a safe manner. When the applying process, or the whole cluster, dies, it needs to be possible to find out up to where data has successfully been replicated. Naive solutions to this, such as updating a row in a table for every replayed transaction, have problems like run-time overhead and database bloat.
Using the replication origin infrastructure a session can be
marked as replaying from a remote node (using the
Additionally the LSN and
commit time stamp of every source transaction can be configured
on a per transaction basis using
pg_replication_origin_xact_setup(). If that's done
replication progress will persist in a crash safe manner. Replay
progress for all replication origins can be seen in the
pg_replication_origin_status view. An
individual origin's progress, e.g. when resuming replication, can
be acquired using
pg_replication_origin_progress() for any origin or
pg_replication_origin_session_progress() for the
origin configured in the current session.
In replication topologies more complex than replication from
exactly one system to one other system, another problem can be
that it is hard to avoid replicating replayed rows again. That
can lead both to cycles in the replication and inefficiencies.
Replication origins provide an optional mechanism to recognize
and prevent that. When configured using the functions referenced
in the previous paragraph, every change and transaction passed to
output plugin callbacks (see Section 49.6)
generated by the session is tagged with the replication origin of
the generating session. This allows treating them differently in
the output plugin, e.g. ignoring all but locally-originating
rows. Additionally the
filter_by_origin_cb callback can be used to
filter the logical decoding change stream based on the source.
While less flexible, filtering via that callback is considerably
more efficient than doing it in the output plugin.
If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.