I am teaching a Real Application Clusters course this week in Vienna for the Vienna Airport IT. The staff at Vienna Aiport IT is very experienced in Oracle Database administration – one of the students has worked with version 5 even! Often, I am the “version eldest “with 7.3.4 in my classes, but not this time
I start that course with a description of RAC basic architecture usually, and the developement and explanation of the following picture takes about one and a half hour:
Maybe you can recognize something from that picture. As a legend: The blue box in the center is the Shared Storage, on which the database files reside. Beneath (black) is the voting disk and the OCR. They can’t be on an ASM diskgroup, as the database is. We have a 2 node cluster here: The black boxes are the nodes A and B. They are connected to the shared storage (black lines) and have also local storage each (small blue boxes), where the OS, clusterware & DB software are installed on. The main clusterware processes on each node are cssd (using the voting disk) and crsd (using the OCR). On each node is an instance running (I refer to the ASM instances at a later stage in the course). We have the usual background processes like DBWR and LGWR (red)and the single instance known SGA (red).
Additional, there a background processes attached to the instance that are only seen in a RAC. The most important ones are LMON and LMS (green), that make up (under their “marketing name” GES and GCS) the Global Resource Directory. At least 2 network cards (NICs) are on each node: One for the Private Interconnect (eth2, red), and one for the Public LAN (black above).
Often, as in our setup, there is a third network card to connect the node to a SAN. The IP address resp. the IP alias of the Public NIC (eth0) are not used by clients to connect to the RAC. Instead, virtual IPs (VIPs) are used (green). That has the advantage, that those VIPs are resources, controlled by the clusterware, so in case of a node failure, clients don’t have to wait on timeouts from the network layer. Instead, clusterware can lead to an immediate connect time failover – and even to a transparent application failover (TAF) for existing session, if configured.
That is of course not all I say about it in the class, but maybe you get an impression