Posts Tagged exadata

Exadata Patching Introduction

These I consider the most important points about Exadata Patching:

Where is the most recent information?

MOS Note 888828.1 is your first read whenever you think about Exadata Patching

What is to patch with which utility?

Exadata Patching

Expect quarterly bundle patches for the storage servers and the compute nodes. The other components (Infiniband switches, Cisco Ethernet Switch, PDUs) are less frequently patched and not on the picture therefore.

The storage servers have their software image (which includes Firmware, OS and Exadata Software)  exchanged completely with the new one using patchmgr. The compute nodes get OS (and Firmware) updates with dbnodeupdate.sh, a tool that accesses an Exadata yum repository. Bundle patches for the Grid Infrastructure and for the Database Software are being applied with opatch.

Rolling or non-rolling?

This the sensitive part! Technically, you can always apply the patches for the storage servers and the patches for compute node OS and Grid Infrastructure rolling, taking down only one server at a time. The RAC databases running on the Database Machine will be available during the patching. Should you do that?

Let’s focus on the storage servers first: Rolling patches are recommended only if you have ASM diskgroups with high redundancy or if you have a standby site to failover to in case. In other words: If you have a quarter rack without a standby site, don’t use rolling patches! That is because the DBFS_DG diskgroup that contains the voting disks cannot have high redundancy in a quarter rack with just three storage servers.

Okay, so you have a half rack or bigger. Expect one storage server patch to take about two hours. That summarizes to 14 hours (for seven storage servers) patching time with the rolling method. Make sure that management is aware about that before they decide about the strategy.

Now to the compute nodes: If the patch is RAC rolling applicable, you can do that regardless of the ASM diskgroup redundancy. If a compute node gets damaged during the rolling upgrade, no data loss will happen. On a quarter rack without a standby site, you put availability at risk because only two compute nodes are there and one could fail while the other is just down.

Why you will want to have a Data Guard Standby Site

Apart from the obvious reason for Data Guard – Disaster Recovery – there are several benefits associated to the patching strategy:

You can afford to do rolling patches with ASM diskgroups using normal redundancy and with RAC clusters that have only two nodes.

You can apply the patches on the standby site first and test it there – using the snapshot standby database functionality (and using Database Replay if you licensed Real Application Testing)

A patch set can be applied on the standby first and the downtime for end users can be reduced to the time it takes to do a switchover

A release upgrade can be done with a (Transient) Logical Standby, reducing again the downtime to the time it takes to do a switchover

I suppose this will be my last posting in 2014, so Happy Holidays and a Happy New Year to all of you :-)

3 Comments

Why Write-Through is still the default Flash Cache Mode on #Exadata X-4

The Flash Cache Mode still defaults to Write-Through on Exadata X-4 because most customers are better suited that way – not because Write-Back is buggy or unreliable. Chances are that Write-Back is not required, so we just save Flash capacity that way. So when you see this

CellCLI> list cell attributes flashcachemode
         WriteThrough

it is likely to your best :-)
Let me explain: Write-Through means that writing I/O coming from the database layer will first go to the spinning drives where it is mirrored according to the redundancy of the diskgroup where the file is placed that is written to. Afterwards, the cells may populate the Flash Cache if they think it will benefit subsequent reads, but there is no mirroring required. In case of hardware failure, the mirroring is already sufficiently done on the spinning drives, as the pictures shows:

Flash Cache Mode Write-Through

Flash Cache Mode WRITE-THROUGH

That changes with the Flash Cache Mode being Write-Back: Now writes go primarily to the Flashcards and popular objects may even never get aged out onto the spinning drives. At least that age out may happen significantly later, so the writes on flash must be mirrored now. The redundancy of the diskgroup where the object in question was placed on determines again the number of mirrored writes. The two pictures assume normal redundancy. In other words: Write-Back reduces the usable capacity of the Flashcache at least by half.

Flash Cache Mode Write-Back

Flash Cache Mode WRITE-BACK

Only databases with performance issues on behalf of writing I/O will benefit from Write-Back, the most likely symptom of which would be high numbers of the Free Buffer Waits wait-event. And Flash Logging is done with both Write-Through and Write-Back. So there is a good reason behind turning on the Write-Back Flash Cache Mode only on demand. I have explained this just very similar during my present Oracle University Exadata class in Frankfurt, by the way :-)

10 Comments

Modify segment attributes on Logical Standby during Migration

Whenever you want to change certain storage attributes of your production tables, a Logical Standby can help. A common case is to do that during a migration. Not only will Data Guard help you to reduce downtime for the migration to the time it takes to do a switchover! With a Logical Standby, the segments do not need to look exactly (physically) the same as on the Primary. That’s where the name comes from :-)

[oracle@uhesse1 ~]$ dgmgrl sys/oracle@prima 
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
 Connected.
 DGMGRL> show configuration
Configuration - myconf
Protection Mode: MaxPerformance
 Databases:
 prima - Primary database
 logst - Logical standby database
Fast-Start Failover: DISABLED
Configuration Status:
 SUCCESS

My demo setup is on 11.2.0.3, but the shown technique should work similar with older versions. I will now create a demo user with a table that gets the default initial extent of 64k in the default tablespace of the database that uses autoallocate as the extent allocation type:

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam@prima
Connected.
SQL> create table t as select * from dual;

Table created.

SQL> select extents,initial_extent from user_segments where segment_name='T';

   EXTENTS INITIAL_EXTENT
---------- --------------
	 1	    65536

SQL> select rowid,t.* from t;

ROWID		   D
------------------ -
AAADSaAAEAAAACTAAA X

I listed the rowid above to show that it will be different on the Logical Standby – but that is no problem because of the supplemental logging we do on the Primary to be able to identify the modified rows on the Logical Standby without that rowid from the Primary. Now let’s assume we want to have the initial extent changed to 8M – a recommendation on Exadata for large segments, by the way. See how easy that can be done:

DGMGRL> edit database logst set state=apply-off;
Succeeded.
DGMGRL> show database logst;

Database - logst

  Role:            LOGICAL STANDBY
  Intended State:  APPLY-OFF
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Instance(s):
    logst

Database Status:
SUCCESS

I need to stop the SQL Apply – not the Redo Transport – before the change on the Logical Standby can be done. Technically speaking, your Recovery Point Objective remains the same as before, but your Recovery Time Objective increases according to the amount of redo that accumulates now in the meantime. Now to the modification of the storage attributes:

SQL> select to_char(event_time,'dd-hh24:mi:ss') as time,event,status from dba_logstdby_events order by 1;

TIME	    EVENT					       STATUS
----------- -------------------------------------------------- -----------------------------------------------------------
29-16:03:07						       Shutdown acknowledged
29-16:03:07						       ORA-16128: User initiated stop apply successfully completed

SQL> alter table adam.t move storage (initial 8m);

Table altered.

If you see any other error message in dba_logstdby_events, fix these errors before you go on with the task. The table uses now 8M initial extents – it is reorganized also during this step. In my simplified example, there is no index on the table. Otherwise indexes need to be rebuilt now. I will restart the SQL Apply and check whether it works again:

DGMGRL> edit database logst set state=apply-on;
Succeeded.
DGMGRL> show database logst;

Database - logst

  Role:            LOGICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Instance(s):
    logst

Database Status:
SUCCESS

SQL> update t set dummy='Y' where dummy='X';

1 row updated.

SQL> commit;

Commit complete.

Now back on the Logical Standby:

SQL> connect adam/adam@logst
Connected.
SQL> select to_char(event_time,'dd-hh24:mi:ss') as time,event,status from dba_logstdby_events order by 1;

TIME	    EVENT					       STATUS
----------- -------------------------------------------------- -------------------------------------------
29-16:16:11						       ORA-16111: log mining and apply setting up
29-16:16:11						       Apply LWM 463670, HWM 463670, SCN 464525
29-16:16:11						       APPLY_UNSET: MAX_SGA
29-16:16:11						       APPLY_UNSET: MAX_SERVERS

SQL> select extents,initial_extent from user_segments where segment_name='T';

   EXTENTS INITIAL_EXTENT
---------- --------------
	 1	  8388608

SQL> select rowid,t.* from t;

ROWID		   D
------------------ -
AAADZrAAEAAAAESAAA Y

As you can see, the storage attribute was changed, and the rowid is different here. After a switchover to the Logical Standby, my production table uses now 8m initial extents.
Conclusion: With a Logical Standby in place, it is relatively easy to change storage attributes. It requires a stop of the SQL Apply process, though. You may want to use this approach during a migration to combine the major task (the migration) with some housekeeping. If you ever wondered why Logical Standby is listed under the Logical Migration Methods although it starts with (and has the same limitations as) a Physical Standby – you have just seen the reason. As always: Don’t believe it, test it :-)

,

2 Comments

Follow

Get every new post delivered to your Inbox.

Join 3,483 other followers

%d bloggers like this: