Selective Replication help

From: Hailey Eckstrand <haileyeckstrand(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Selective Replication help
Date: 2015-05-28 22:13:16
Message-ID: CABCgRVEzwwzyeo0kXCTSJs0_geaQ+EUvTDM_deyOZfj-o4EmrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I have one local master database and 4 remote slave databases. Every hour I
send, around 1000 rows from the master to each of the slaves (4 columns).
For context, my master database is where I do web scraping. After the data
is scraped, I send it out to the slave databases.

Of the 1000 rows, there are some duplicates in the primary key of the slave
databases so part of the push involves deleting the primary key overlaps
and then inserting all of the data.

On the master database, the data I send to the slaves is compiled from a
view which is a subset of 5 tables. The reason I am explaining this is
because it is not a simple master table, slave table replication. The data
I send out is from a view of multiple tables in the master and inserted
into one table on the slave.

Currently, the way I push the data from local to remote is,
I've created an SQL function which operates on a slave database table (via
a foreign data wrapper) that checks if the row exists and if it does, it
updates the values and if it doesn't exist, it inserts the new row.

I find that this is very slow and I'm guessing inefficient but I'm not very
good at testing that. The amount of rows that I'm sending out is about to
increase to 10,000 and I'd like to look into other solutions.

Option 1) In my research, I've found the function
'dblink_build_sql_insert' that
states it 'can be useful in doing selective replication of a local table to
a remote database' which sounds like what I'm doing but I've not found any
examples online.

Option 2) I've also been looking into a master-slave replication software
called bucardo but it works table to table and so if I did that.. I think I
would have to create local versions of the slave table (which would be
storing duplicated data), update them by triggers (when new data was
inserted into the 5 tables via web scraping scripts) and then set up
replication on those tables to the slave databases. However, with
replication comes a steep learning curve. I'm not sure if what I'm trying
to do is complicated enough to warrant replication.

Can anyone suggest other options which would be fast and perhaps more
efficient or let me know if 1 or 2 are a good idea?
I know enough python and php to get by and can figure out triggers. I am
not amazing at plpgsql but could figure it out.

Thank you,
Hailey

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Shashwat Arghode 2015-05-29 06:13:06 Re: [SQL] Drop or disable or bypass "_return" rule on select on a view.
Previous Message Faisal Karim 2015-05-28 19:34:07 Re: [NOVICE] Drop or disable or bypass "_return" rule on select on a view.