Re: Replication Ideas

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL General ML <pgsql-general(at)postgresql(dot)org>
Subject: Re: Replication Ideas
Date: 2003-08-29 03:20:10
Message-ID: 1062127209.30745.84.camel@haggis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-performance

On Thu, 2003-08-28 at 17:52, Dennis Gearon wrote:
> Are these clusters physically together using dedicate LAN lines .... or
> are they synchronizing over the Interwait?

There have been multiple methods over the years. In order:

1. Cluster Interconnect (CI) : There's a big box, called the CI,
that in the early days was really a stripped PDP-11 running
an RTOS. Each VAX (and, later, Alpha) is connected to the CI
via a special adapters and cables. Disks are connected to an
"HSC" Storage Controllers which also plug into the CI. Basic-
ally, it's a big, intelligent switch. Disk sectors pass
along the wires from VAX and Alpha to disks and back. DLM
messages pass along the wires from node to node. With mul-
tiple CI adapters, and HSCs (they were dual-ported) you could
set up otal dual-redundancy. Up to 96 nodes can be cluster-
ed. It still works, but Memory Channel is preferred now.

2. LAVC - Local Area VAX Cluster : In this scheme, disks were
directly attached to nodes, and data (disk and DLM) is trans-
ferred back and forth across the 10Mbps Ethernet. It could
travel over TCP/IP or DECnet. For obvious reasons, LAVC was
a lot cheaper and slower than CI.

3. SCSI clusters : SCSI disks are wired to a dual-ported "HSZ"
Storage Controller. Then, SCSI cards on each of 2 nodes
could be wired into a port. The SCSI disks could also be
wired to a 2nd HSZ, and a 2nd SCSI card in each node plugged
into that HSZ, dual-redundancy is achieved. With modern
versions of VMS, the SCSI drivers can choose which SCSI
card it wanted to send data through, to increase performance.
DLM messages are passed via TCP/IP. Only 2 nodes can be
clustered. A related method uses fiber channel disks on
"HSG" Storage Controllers.

4. Memory Channel : A higher speed interconnect. Don't know
much about it. 128 nodes can be clustered.

Note that since DLM awareness is built deep into VMS and all the
RTLs, every program is cluster-aware, no matter what type of
cluster method is used.

> Ron Johnson wrote:
>
> >On Thu, 2003-08-28 at 16:00, Jan Wieck wrote:
> >
> >
> >>Ron Johnson wrote:
> >>
> >>
> >>
> >>>Notes:
> >>>a) this is, of course, not *sufficient* for multi-master
> >>>b) yes, you need a fast, low latency network for the DLM chatter.
> >>>
> >>>
> >>"Fast" is an understatement. The DLM you're talking about would (in our
> >>case) need to use Spread's AGREED_MESS or SAFE_MESS service type,
> >>meaning guarantee of total order. A transaction that needs any type of
> >>lock sends that request into the DLM group and then waits. The incoming
> >>stream of lock messages determines success or failure. With the overhead
> >>of these service types I don't think one single communication group for
> >>all database backends in the whole cluster guaranteeing total order will
> >>be that efficient.
> >>
> >>
> >
> >I guess it's the differing protocols involved. DEC made clustering
> >(including Rdb/VMS) work over an 80Mbps protocol, back in The Day,
> >and HPaq says that it works fine now over fast ethernet.
> >
> >
> >
> >>>This is a tried and true method of synchronization. DEC Rdb/VMS
> >>>has been using it for 19 years as the underpinnings of it's cluster
> >>>technology, and Oracle licensed it from them (well, really Compaq)
> >>>for it's 9i RAC.
> >>>
> >>>
> >>Are you sure they're using it that way?
> >>
> >>
> >
> >Not as sure as I am that the sun will rise in the east tomorrow,
> >but, yes, I am highly confident that O modified DLM for use in
> >9i RAC. Note that O purchased Rdb/VMS from DEC back in 1994, along
> >with the Engineers, so they have long knowledge of how it works
> >in VMS. One of the reasons they bought Rdb was to merge the tech-
> >nology into RDBMS.
> >
> >
> >

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron(dot)l(dot)johnson(at)cox(dot)net
Jefferson, LA USA

"Oh, great altar of passive entertainment, bestow upon me thy
discordant images at such speed as to render linear thought impossible"
Calvin, regarding TV

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruno Wolff III 2003-08-29 03:20:22 Re: acquiring row and page level locks
Previous Message Bruno Wolff III 2003-08-29 03:11:57 Re: Problems with transactions and sequences

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-08-29 03:22:13 Re: ALTER TABLE
Previous Message Christopher Browne 2003-08-29 03:12:52 Re: [HACKERS] 2-phase commit

Browse pgsql-performance by date

  From Date Subject
Next Message Bruno Wolff III 2003-08-29 03:38:18 Re: bad estimates / non-scanning aggregates
Previous Message Bruno Wolff III 2003-08-29 03:01:56 Re: bad estimates / non-scanning aggregates