From: | digimer <lists(at)alteeve(dot)ca> |
---|---|
To: | Tim Cross <theophilusx(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Weird procedure question |
Date: | 2018-09-25 22:42:39 |
Message-ID: | d9220a5e-cb04-ef4d-d4a7-cce5f22c9124@alteeve.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 2018-09-25 6:41 p.m., digimer wrote:
> On 2018-09-25 6:22 p.m., Tim Cross wrote:
>> digimer <lists(at)alteeve(dot)ca> writes:
>>
>>> Hi all,
>>>
>>> I've got an interesting use case that I am stuck on. It's a bit of a
>>> complicated environment, but I'll try to keep it simple.
>>>
>>> In short; I have a history schema that has tables that match the
>>> public schema, plus one 'history_id' column that has a simple
>>> sequential
>>> bigserial value. Then I have a procedure and trigger that, on UPDATE or
>>> INSERT, copies the data to history. Example use case is that I can
>>> UPDATE a sensor value in the public table and it's also INSERTs the
>>> data
>>> into history. So public shows just the most recent values, but I can
>>> see
>>> changes over time in the history schema.
>>>
>>> I have built my system to support writing to one or more DBs. I keep
>>> a list of connected DBs and send INSERT/UPDATE calls to a method that
>>> then runs the UPDATE/INSERT against all connected databases, as a form
>>> of redundancy. This all works fine.
>>>
>>> The problem I've hit is that the 'history_id' differs between the
>>> various databases. So I want to switch this to 'history_uuid' and use
>>> UUIDs instead of bigserial.
>>>
>>> Now the question;
>>>
>>> Can I tell a produce to use a specific UUID?
>>>
>>> The idea is to generate a UUID for 'history_uuid' so that I have
>>> consistency across databases. Of course, if an UPDATE will change
>>> multiple rows, then I'll need to predefine multiple UUIDs. This is
>>> where
>>> things start to get really complicated I think... Maybe I could pass an
>>> array of UUIDs? I don't care if I find out which UUID was used for
>>> which
>>> record, just that the same UUID was used for the same record when the
>>> procedure is (re)run on other DBs.
>>>
>>> The databases are not clustered, on purpose. I've been trying to
>>> handle all the HA stuff in my application for various reasons.
>>>
>>> If it helps, here is an example pair of tables, the procedure and the
>>> trigger I currently use;
>>>
>>> ====
>>> CREATE TABLE host_variable (
>>> host_variable_uuid uuid not null primary key,
>>> host_variable_host_uuid uuid not null,
>>> host_variable_name text not null,
>>> host_variable_value text not null,
>>> modified_date timestamp with time zone not null
>>> );
>>> ALTER TABLE host_variable OWNER TO admin;
>>>
>>> CREATE TABLE history.host_variable (
>>> history_id bigserial,
>>> host_variable_uuid uuid,
>>> host_variable_host_uuid uuid,
>>> host_variable_name text,
>>> host_variable_value text,
>>> modified_date timestamp with time zone not null
>>> );
>>> ALTER TABLE history.host_variable OWNER TO admin;
>>>
>>> CREATE FUNCTION history_host_variable() RETURNS trigger
>>> AS $$
>>> DECLARE
>>> history_host_variable RECORD;
>>> BEGIN
>>> SELECT INTO history_host_variable * FROM host_variable WHERE
>>> host_uuid = new.host_uuid;
>>> INSERT INTO history.host_variable
>>> (host_variable_uuid,
>>> host_variable_host_uuid,
>>> host_variable_name,
>>> host_variable_value,
>>> modified_date)
>>> VALUES
>>> (history_host_variable.host_variable_uuid,
>>> history_host_variable.host_variable_host_uuid,
>>> history_host_variable.host_variable_name,
>>> history_host_variable.host_variable_value,
>>> history_host_variable.modified_date);
>>> RETURN NULL;
>>> END;
>>> $$
>>> LANGUAGE plpgsql;
>>> ALTER FUNCTION history_host_variable() OWNER TO admin;
>>>
>>> CREATE TRIGGER trigger_host_variable
>>> AFTER INSERT OR UPDATE ON host_variable
>>> FOR EACH ROW EXECUTE PROCEDURE history_host_variable();
>>> ====
>>>
>>> I know this might sound odd, but I didn't want to complicate things
>>> with how my system works. However, if it would help solve the problem,
>>> I'm happy to dig into more detail.
>>>
>>> Thanks!
>> I think James has probably given you the input you need - basically,
>> don't allow the system to automatically set the modified time - make
>> that parameter to your function or set that value before the copy to the
>> history tables - content would then be the same, so uuid v3 should work.
>>
>> However, I do think you have another big problem lurking in the
>> shadows. What happens if any of your connected databases are unavailable
>> or unreachable for a period of time? I suspect your going to run into
>> update anomalies and depending on your setup/environment, possibly even
>> partitioning problems (depending on number of clients and typology
>> etc). These are well known problems in distributed or replication
>> systems.
>>
>> You appear to be implementing a 'poor mans' replication system. There
>> are lots of complex issues to deal with and I wonder why you want to
>> take them on when PG has already got well tested and robust solutions
>> for this that would simplify your architecture and avoid the need to
>> re-implement functionality which already exists?
>>
>> regards,
>>
>> Tim
>>
> Hi Tim,
>
> Last I checked, pgsql couldn't handle this;
>
> Two DBs up, getting data.
> DB1 goes down, DB2 continues to collect data.
> DB2 goes down
> DB1 comes back up, starts collecting data.
> DB2 comes back up, now I need to move data in both directions (DB1 has
> data 2 doesn't and vice-versa).
>
> I've created a way to resolve this in my application and it's worked
> for some time (obviously, in my application only. It's not a general
> purpose system nor is it intended to be).
>
> For the record, I realized I was looking for a complex solution to a
> simple problem. I do create the 'modified_date' value in my app, and I
> just needed to refresh it between UPDATEs/INSERTs on the same column
> so that no two records in the history table have the same
> 'modified_date'. With that, my resync works again.
>
> Cheers,
>
> digimer
I should mention, I used two DBs in the example, but it could be 3 or
more, and I need to resync in as many directions at once as needed. So
it's a "poor person's" N-way replication cluster, as you guessed. I'd
love to use existing tools if they actually exist.
digimer
From | Date | Subject | |
---|---|---|---|
Next Message | Christopher Browne | 2018-09-25 23:11:46 | Re: Weird procedure question |
Previous Message | digimer | 2018-09-25 22:41:04 | Re: Weird procedure question |