These I consider the most important points about Exadata Patching:
Where is the most recent information?
MOS Note 888828.1 is your first read whenever you think about Exadata Patching
What is to patch with which utility?
Expect quarterly bundle patches for the storage servers and the compute nodes. The other components (Infiniband switches, Cisco Ethernet Switch, PDUs) are less frequently patched and not on the picture therefore.
The storage servers have their software image (which includes Firmware, OS and Exadata Software) exchanged completely with the new one using patchmgr. The compute nodes get OS (and Firmware) updates with dbnodeupdate.sh, a tool that accesses an Exadata yum repository. Bundle patches for the Grid Infrastructure and for the Database Software are being applied with opatch.
Rolling or non-rolling?
This the sensitive part! Technically, you can always apply the patches for the storage servers and the patches for compute node OS and Grid Infrastructure rolling, taking down only one server at a time. The RAC databases running on the Database Machine will be available during the patching. Should you do that?
Let’s focus on the storage servers first: Rolling patches are recommended only if you have ASM diskgroups with high redundancy or if you have a standby site to failover to in case. In other words: If you have a quarter rack without a standby site, don’t use rolling patches! That is because the DBFS_DG diskgroup that contains the voting disks cannot have high redundancy in a quarter rack with just three storage servers.
Okay, so you have a half rack or bigger. Expect one storage server patch to take about two hours. That summarizes to 14 hours (for seven storage servers) patching time with the rolling method. Make sure that management is aware about that before they decide about the strategy.
Now to the compute nodes: If the patch is RAC rolling applicable, you can do that regardless of the ASM diskgroup redundancy. If a compute node gets damaged during the rolling upgrade, no data loss will happen. On a quarter rack without a standby site, you put availability at risk because only two compute nodes are there and one could fail while the other is just down.
Why you will want to have a Data Guard Standby Site
Apart from the obvious reason for Data Guard – Disaster Recovery – there are several benefits associated to the patching strategy:
You can afford to do rolling patches with ASM diskgroups using normal redundancy and with RAC clusters that have only two nodes.
You can apply the patches on the standby site first and test it there – using the snapshot standby database functionality (and using Database Replay if you licensed Real Application Testing)
A patch set can be applied on the standby first and the downtime for end users can be reduced to the time it takes to do a switchover
A release upgrade can be done with a (Transient) Logical Standby, reducing again the downtime to the time it takes to do a switchover
I suppose this will be my last posting in 2014, so Happy Holidays and a Happy New Year to all of you 🙂