griddisk | Pickleball spielen

Exadata Part VII: Meaning of the various Disk Layers

Veröffentlicht von Uwe Hesse in TOI am Mai 18, 2011

Whenever I have tought an Exadata Database Machine course, there was some confusion amongst the attendees about the many different kinds of Disks encountered on the Storage Servers (Cells): We have not less than 4 different layers there, and the intention of this posting is to clarify the meaning of the different layers and to explain what an Administrator needs to do there eventually.

Each Cell comes with 12 SAS Harddisks (600 GB each with High Performance resp. 2 TB each with High Capacity). The picture below shows a Cell with the 12 Harddisks on the front:

Also each Cell has 4 Flashcards built in that are divided into 4 Flashdisks each, summarizing to 16 Flashdisks in each Cell that deliver by default 384 GB Flash Cache. At this stage, the first layer of abstraction comes:

1) Physical Disks

Physical Disks can be of the type Harddisk or of the type Flashdisk. You cannot create or drop them. The only administrative task on that layer can be to turn the LED at the front of the Cell on before you replace a damaged Harddisk to be sure you pull out the right one, with a command like

CellCLI> alter physicaldisk  serviceled on

2) Luns

Luns are the second layer of abstraction. They have been introduced, because the first two Harddisks in every Cell are different than the other 10 in so far as they contain the Operating System (Oracle Enterprise Linux). About 30 GB have been carved out of the first 2 Harddisks for that purpose. We have 2 of them for redundancy – the Cell can still operate if only one of the first 2 Harddisks fails. If we investigate the first 2 LUNs, we see the mirrored OS Partitions. Joel Goodman has done that in a very instructive posting. I need to correct my original statement in this post that said „The first 2 Luns are 30 GB smaller than the other 10 therefore.“ The LUNs are equally sized on each Harddisk, but the usable space (for Celldisks resp. Griddisks) is about 30 GB less on the first two.

As an Administrator, you do not need to do anything on the Lun Layer except looking at it with commands like

CellCLI> list lun

3) Celldisks

Celldisks are the third layer of abstraction. It was introduced to enable interleaving in the first place. There has been some misconception about that in the Exadata Community, which is why I will spent some more lines on that topic. Typically, our ACS creates all the Celldisks for you without interleaving with a command like

CellCLI> create celldisk all harddisk

When you investigate your Celldisks you would see something like that:

CellCLI> list celldisk attributes name,interleaving where disktype=harddisk
         CD_disk01_cell1         none
         CD_disk02_cell1         none
         CD_disk03_cell1         none
         CD_disk04_cell1         none
         CD_disk05_cell1         none
         CD_disk06_cell1         none
         CD_disk07_cell1         none
         CD_disk08_cell1         none
         CD_disk09_cell1         none
         CD_disk10_cell1         none
         CD_disk11_cell1         none

My Celldisk #12 is not showing because I have dropped it to show the alternative creation with interleaving:

CellCLI> create celldisk all harddisk interleaving='normal_redundancy'
CellDisk CD_disk12_cell1 successfully created

In a real world configuration, every Celldisk (on every Cell) would have the same interleaving (none, normal_redundancy or high_redundancy). The interleaving attribute of the Celldisk determines the placement of the later created Griddisks on that Celldisk.

So as and Administrator, you could create and drop Celldisks – although you will rarely if at all do that. Most customers are best suited with the default configuration that comes without interleaving: First created Griddisks are the fastest -> DATA is faster than RECO

4) Griddisks

Griddisks are the fourth layer of abstraction, and they will be the Candidate Disks to build your ASM diskgroups from. By default (interleaving=none on the Celldisk layer), the first Griddisk that is created upon a Celldisk is placed on the outer sectors of the underlying Harddisk. It will have the best performance therefore. If we follow the recommendations, we will create 3 Diskgroups upon our Griddisks: DATA, RECO and SYSTEMDG.

DATA is supposed to be used as the Database Area (DB_CREATE_FILE_DEST=’+DATA‘ on the Database Layer), RECO will be the Recovery Area (DB_RECOVERY_FILE_DEST=’+RECO‘) and SYSTEMDG will be used to hold Voting Files and OCR files. It makes sense that DATA has a better performance than RECO, and SYSTEMDG can be placed on the slowest (inner) part of the Harddisks.

With interleaving specified at the Celldisk layer, this is different: The Griddisks are then being created from outer and inner parts of the Harddisk, leading to equal performance of the Griddisks and also then of the later created Diskgroups. This option was introduced for customers who want to provide different Diskgroups for different Databases without preferring one Database over the other.

We will take Griddisks out of the 10 non System Harddrives of each Cell in the size of about 30 GB to build the Diskgroup SYSTEMDG upon. That leaves us with the same amount of space left on each of the 12 Harddrives for the DATA and RECO diskgroup. You may wonder why the SYSTEMDG Diskgroup gets relatively large with that approach – much larger than the space that is required by Voting Files and OCR. That space gets used if you establish a DBFS filesystem with a dedicated DBFS database that uses that SYSTEMDG diskgroup as the Database Area. In this DBFS filesystem, you may store flat files in to process them with External Tables (or SQL*Loader) from your productive Databases.

So as an Administrator, you can (and will, most likely) create and drop Griddisks; typically 3 Griddisks are carved out of each Celldisk, resp. 2 out of the first 2 that contain already the OS. Assuming we have High Performance Disks:

CellCLI> create griddisk all harddisk prefix=temp_dg, size=570G

This command will create 12 Griddisks, each of 570G in size from the outer (fastest) sectors of the underlying Harddisks. It fills up the first 2 Celldisks entirely, because they have just 570G space free – the rest is already consumed by the OS partition.

CellCLI> create griddisk all harddisk prefix=systemdg

This command creates 10 Griddisks for the systemdg diskgroup, consuming all the available (30G) space remaining on the 10 non system Harddisks. They will be on the slowest part without interleaving.

CellCLI> drop griddisk all prefix=temp_dg

Now we have dropped that griddisks, leaving the faster parts empty for the next 2 Diskgroups:

CellCLI> create griddisk all harddisk prefix=data, size=270G

It is best practice to give the name of the future diskgroup as a prefix for the Griddisks. We have now 12 Griddisks for the future DATA diskgroup on the outer sectors created. The remaining space (300G) will be consumed by the reco Griddisks:

CellCLI> create griddisk all harddisk prefix=reco

We are now ready to continue on the Database Layer and create ASM Diskgroups there. I have given an example (incidentally with Flashdisks, but it looks the same with Harddisks) for that already in this posting. From that Layer, Griddisks just look like ASM (Candidate) Disks.

Conclusion: All the various Disk Layers in Exadata are there for a good reason. As an Administrator, you will probably only deal with Griddisks, though. There are multiple Griddisks carved out of each Celldisk->Lun->Physical Disk. On the Database Layer, Griddisks look and feel like ASM Disks that you use for your ASM Diskgroups.

exadata, griddisk