Posts Tagged Brief Introduction

Brief Introduction into Partitioning in #Oracle

Partitioning is a great way to deal with large tables. This post will give you a quick start with examples that you can reproduce easily for yourself. Focus will be on Range-Partitioning, which is still the most popular kind.

First things first: You should only consider to implement partitioning for really large (GB range or more) objects, because it is an extra charged option and the benefits do not show significantly with small objects.

The two major reasons why you may want to use partitioning are Performance and Manageability. Let’s look at this picture:

Picture of a Range Partitioned TableAbove table is partitioned by the quarter. You will see that the table name and the columns are known by the application layer (INSERT and SELECT statements come from there), while the partitioned nature of the table needs to be known by the DBA only. I’m going to implement this on my demo system:

SQL> grant dba to adam identified by adam;
Grant succeeded.
SQL> connect adam/adam
 Connected.
 SQL> create table sales (id number, name varchar2(20),
 amount_sold number, shop varchar2(20), time_id date)
 partition by range (time_id)
 (
 partition q1 values less than (to_date('01.04.2012','dd.mm.yyyy')),
 partition q2 values less than (to_date('01.07.2012','dd.mm.yyyy')),
 partition q3 values less than (to_date('01.10.2012','dd.mm.yyyy')),
 partition q4 values less than (to_date('01.01.2013','dd.mm.yyyy'))
 );
Table created.

From the viewpoint of the application, this is transparent, but the value of the TIME_ID column determines into which partition the inserted rows will go. And also, if subsequent SELECT statements have the partition key in the WHERE clause, the optimizer knows which partitions need not to be scanned. This is called Partition Pruning:

Picture of INSERTand SELECT on Partitioned TableI’ll show the application perspective first:

 
SQL> insert into sales values ( 1, 'John Doe', 5000, 'London', date'2012-02-16' );
1 row created.
SQL> commit;
Commit complete.
SQL> select sum(amount_sold)
 from sales
 where time_id between
 date'2012-01-01' and date'2012-03-31';
SUM(AMOUNT_SOLD)
 ----------------
 5000
SQL> set lines 300
SQL> select plan_table_output from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
SQL_ID	crtwzf8j963h7, child number 0
-------------------------------------
select sum(amount_sold) from sales where time_id between
date'2012-01-01' and date'2012-03-31'

Plan hash value: 642363238

-------------------------------------------------------------------------------------------------
| Id  | Operation		| Name	| Rows	| Bytes | Cost (%CPU)| Time	| Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	|	|	|	|    14 (100)|		|	|	|
|   1 |  SORT AGGREGATE 	|	|     1 |    22 |	     |		|	|	|
|   2 |   PARTITION RANGE SINGLE|	|     1 |    22 |    14   (0)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS FULL	| SALES |     1 |    22 |    14   (0)| 00:00:01 |     1 |     1 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("TIME_ID">=TO_DATE(' 2012-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
	      "TIME_ID"<=TO_DATE(' 2012-03-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Notice the PSTART=1 and PSTOP=1 above, which indicates Partition Pruning. So only one quarter was scanned through, speeding up my Full Table Scan accordingly. When the table is partitioned by the day, that SELECT on a large, even filled table would run 365 times faster – which is not at all unusual, many customers have hundreds, even thousands of partitions exactly therefore.

Now to the Maintenance benefit: DBAs can now get rid of old data very fast with DROP PARTITION commands. DELETE would be an awful lot slower here – if millions of rows are deleted, that is. Or some kind of Information Life-cycle Management can be implemented like compressing old partitions. They can even be moved into other tablespaces that have their datafiles on cheaper storage:

SQL> alter table sales move partition q1 compress;
Table altered.

When you put indexes on a partitioned table, you have the choice between GLOBAL and LOCAL like on the next picture:

Picture of a Partitioned Table with GLOBAL and LOCAL IndexesThe LOCAL index partitions follow the table partitions: They have the same partition key & type, get created automatically when new table partitions are added and get dropped automatically when table partitions are dropped. Beware: LOCAL indexes are usually not appropriate for OLTP access on the table, because one server process may have to scan through many index partitions then. This is the cause of most of the scary performance horror stories you may have heard about partitioning!

A GLOBAL index spans all partitions. It has a good SELECT performance usually, but is more sensitive against partition maintenance than LOCAL indexes. The GLOBAL index needs to be rebuilt more often, in other words. Let’s implement them:

SQL> create index sales_id on sales (id);
Index created.
SQL> create index sales_name on sales (name) local;
Index created.

We have Dictionary Views for everything, of course :-)

SQL> select table_name, tablespace_name from user_tables;

TABLE_NAME		       TABLESPACE_NAME
------------------------------ ------------------------------
SALES

SQL> select table_name, partitioning_type, partition_count from user_part_tables;

TABLE_NAME		       PARTITION PARTITION_COUNT
------------------------------ --------- ---------------
SALES			       RANGE		       4

SQL> select table_name, partition_name, tablespace_name, pct_free, compression from user_tab_partitions;

TABLE_NAME		       PARTITION_NAME		      TABLESPACE_NAME		       PCT_FREE COMPRESS
------------------------------ ------------------------------ ------------------------------ ---------- --------
SALES			       Q1			      USERS				      0 ENABLED
SALES			       Q4			      USERS				     10 DISABLED
SALES			       Q3			      USERS				     10 DISABLED
SALES			       Q2			      USERS				     10 DISABLED

SQL> select index_name, tablespace_name, status from user_indexes;

INDEX_NAME		       TABLESPACE_NAME		      STATUS
------------------------------ ------------------------------ --------
SALES_ID		       USERS			      VALID
SALES_NAME						      N/A

SQL> select index_name, partitioning_type, partition_count from user_part_indexes;

INDEX_NAME		       PARTITION PARTITION_COUNT
------------------------------ --------- ---------------
SALES_NAME		       RANGE		       4

SQL> select index_name, partition_name, tablespace_name,status from user_ind_partitions;

INDEX_NAME		       PARTITION_NAME		      TABLESPACE_NAME		     STATUS
------------------------------ ------------------------------ ------------------------------ --------
SALES_NAME		       Q1			      USERS			     USABLE
SALES_NAME		       Q4			      USERS			     USABLE
SALES_NAME		       Q3			      USERS			     USABLE
SALES_NAME		       Q2			      USERS			     USABLE

This should be enough to get you started. We have much more to say about partitioning, of course: VLDB and Partitioning Guide. The pictures in this posting are from an LVC demonstration that I have done recently to convince potential customers to use this new training format, and I thought to myself: There must be something additional that I can do with this stuff :-)

I hope you find it useful – feel free to comment, also if you’d like to share some of your experiences with partitioning that would be very much appreciated. Thank you!

Conclusion: Partitioning can be a very powerful tool in the DBA’s arsenal to transparently speed up applications and to ease maintenance. It is no silver bullet, though, so as always: Don’t believe it, test it :)

Related postings about Partitioning:

Partition Pruning & Interval Partitioning… shows Partitioning Pruning performance benefit with a larger table and how new range partitions are created automatically
Reducing Buffer Busy Waits with Automatic Segment Space Management & Hash Partitioning… shows why Hash Partitioning is often used for large OLTP tables to reduce contention
Partitioning a table online with DBMS_REDEFINITION… shows how to change the structure of a table while it is permanently accessed by end users
CELL_PARTITION_LARGE_EXTENTS now obsolete… shows that you get 8 MB initial extents for partitioned tables in recent versions
Partition-Pruning: Do & Don’t… shows how the SQL code determines whether Partition Pruning can be used or not

About these ads

, ,

6 Comments

Brief introduction into External Tables

As an Oracle DBA, you will sooner or later encounter the task to deal with interface files (Flat Files) from some other application that you need to process in your Oracle Database. External Tables, introduced in 9i already, are usually the tool of choice for that and can be considered as the successor of the ancient SQL*Loader for that purpose.

An External Table is Metainformation that is stored in the Data Dictionary together with a pointer on the concrete Flat File, placed on the Database Server. We can access this External Table with the same SELECT statements as we would use against ordinary Tables. I have prepared a demonstration with the Flat File ulcase2.dat in the directory /home/oracle/interface on my server:

-bash-3.00$ pwd
/home/oracle/interface
-bash-3.00$ ls -al
total 16
drwxr-xr-x   2 oracle oinstall 4096 Oct 28 14:34 .
drwxrwxrwx  30 oracle oinstall 4096 Oct 28 14:11 ..
-rw-r--r--   1 oracle oinstall  372 Oct 28 14:11 ulcase2.dat
-bash-3.00$ cat ulcase2.dat
7782 CLARK      MANAGER   7839  2572.50          10
7839 KING       PRESIDENT       5500.00          10
7934 MILLER     CLERK     7782   920.00          10
7566 JONES      MANAGER   7839  3123.75          20
7499 ALLEN      SALESMAN  7698  1600.00   300.00 30
7654 MARTIN     SALESMAN  7698  1312.50  1400.00 30
7658 CHAN       ANALYST   7566  3450.00          20

This is just an ordinary ASCII file that we want to access from our Oracle Database:

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam
Connected.

That’s my demo user. Granting the role dba makes the demo easier – no need to grant create/read/write directory privileges especially. Next step is to create a directory as a Database Object, pointing to the OS directory:

SQL> create directory interface_dir as '/home/oracle/interface';

Directory created.

The same directories are needed when we use Data Pump to put Database Objects into binary files. Now here comes the External Table:

SQL> create table interface_table
(
 empno number(4),
 ename varchar2(10),
 job varchar2(9),
 mgr number(4),
 sal number(7,2),
 comm number(7,2),
 deptno number(2)
)
organization external
(
 type oracle_loader
 default directory interface_dir
 access parameters
 (
 records delimited by newline characterset we8iso8859p1
 badfile 'INTERFACE_DIR':'interface.bad'
 logfile 'interface.log_xt'
 fields ldrtrim
 (
 empno (1:4) integer external(4),
 ename (6:15) char(10),
 job (17:25) char(9),
 mgr (27:30) integer external(4),
 sal (32:39) integer external(8),
 comm (41:48) integer external(8),
 deptno (50:51) integer external(2)
 )
 )
 location
 (
 'ulcase2.dat'
 )
)
reject limit unlimited;
Table created.

This is no ordinary table made up by extents and blocks but merely a description of the Flat File:

SQL> select * from user_segments;

no rows selected

SQL> select table_name,tablespace_name from user_tables;

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
INTERFACE_TABLE

Apart from that, we can access it with SELECT like an ordinary table:

SQL> select * from interface_table;

 EMPNO ENAME      JOB              MGR        SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ---------- ---------- ----------
 7782 CLARK      MANAGER         7839     2572.5                    10
 7839 KING       PRESIDENT                  5500                    10
 7934 MILLER     CLERK           7782        920                    10
 7566 JONES      MANAGER         7839    3123.75                    20
 7499 ALLEN      SALESMAN        7698       1600        300         30
 7654 MARTIN     SALESMAN        7698     1312.5       1400         30
 7658 CHAN       ANALYST         7566       3450                    20

8 rows selected.

We can put functions into the query, can add where condition to filter the result set, can choose only some columns etc. That is why External Tables are much more flexible to use as the old SQL*Loader was.

But never the less, SQL*Loader is still useful today to teach us how to create External Tables :-) If you look into the Online Documentation of the 9i version, you can still find the SQL*Loader case studies there – that vanished in 10g and above. You see the most common cases of interface files there together with a SQL*Loader controlfile to load them.

If you call the SQL*Loader then with the parameter external_table=generate_only, it gives you the create statement for an External Table – a thing that is not so easy to derive from the External Table Online Documenation, unfortunately.

Another nice usage of External Tables is, that we can access the Alert Log file with them from inside the Database without having to logon to the Database Server:

SQL> show parameter background_dump_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------------------------------
background_dump_dest                 string      /u01/app/oracle/admin/orcl/bdump

SQL> create or replace directory bdump as '/u01/app/oracle/admin/orcl/bdump';

Directory created.

SQL> create table alert_log
 (line1 varchar2(4000))
 organization external
 (type oracle_loader
 default directory bdump
 access parameters
 (records delimited by newline fields missing field values are null)
 location ('alert_orcl.log')
 )
/  

Table created.

Conlusion: If you need to process Interface Files in your Oracle Database, take a closer look at External Tables for that purpose – the SQL*Loader is a little old fashioned compared to them

,

6 Comments

Brief introduction into Materialized Views

This week, I am teaching an Oracle Database 11g Data Warehouse Administration course in Munich. One of the focus areas of that course are Materialized Views, and I have developed some examples for that course that I like to share with the Oracle community. Other themes of that four days course are Partitioning, ETL and Parallelization.

Materialized Views have the ability to speed up queries (even dramatically) while being transparent for the queries in a similar way as indexes resp. partitioned tables are. That means that we do not have to modify our queries in order to benefit from these structures. Unlike an ordinary view which is only a stored select statement that runs if we use the view, a materialized view stores the result set of the select statement as a container table. Let me demonstrate the benefit of that. I have a (for my little database) relatively large table and do an aggregation query on it:

SQL> desc sales
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PROD_ID                                   NOT NULL NUMBER
 CUST_ID                                   NOT NULL NUMBER
 TIME_ID                                   NOT NULL DATE
 CHANNEL_ID                                NOT NULL NUMBER
 PROMO_ID                                  NOT NULL NUMBER
 QUANTITY_SOLD                             NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                               NOT NULL NUMBER(10,2)

SQL> select count(*) from sales;

 COUNT(*)
----------
 7350744

SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES';

 MB
----------
 286

SQL> show sga

Total System Global Area  373293056 bytes
Fixed Size                  1219496 bytes
Variable Size             134218840 bytes
Database Buffers          234881024 bytes
Redo Buffers                2973696 bytes

SQL> set timing on
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002085
 9       2219410.08

Elapsed: 00:00:04.51

As you can see, the aggregation takes round 5 seconds. Now I will speed that up using a Materialized View:

SQL> create materialized view mv1 enable query rewrite
 2  as select channel_id,sum(amount_sold) from sales group by channel_id;

Materialized view created.

Elapsed: 00:00:05.69
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002085
 9       2219410.08

Elapsed: 00:00:00.01

The very same statement now takes way less time! Why is that so?

SQL> set autotrace on explain
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002085
 9       2219410.08

Execution Plan
----------------------------------------------------------
Plan hash value: 2958490228
--------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     4 |   104 | 3   (0)| 00:00:01
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV1  |     4 |   104 | 3   (0)| 00:00:01
--------------------------------------------------------------------------------

Because the optimizer transparently rewrote the query on the Materialized View! But what if we now change the content of the base table sales?

SQL> update sales set amount_sold=amount_sold+1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.

The materialized view is now stale and will no longer be used for query rewrite (as we can already determine by query runtime):

SQL> select mview_name,staleness from user_mviews;

MVIEW_NAME                     STALENESS
------------------------------ -------------------
MV1                            NEEDS_COMPILE

SQL> set timing on
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002086
 9       2219410.08

Elapsed: 00:00:04.52

In order to get statements rewritten against the materialized view again, we must refresh it by some method.

One method is on demand with a procedure call like in this example:

SQL> exec dbms_mview.refresh(list=>'MV1',method=>'C')

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.62

SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002086
 9       2219410.08

Elapsed: 00:00:00.01

Now the MV1 can get used again, as we see by runtime. That was a complete refresh – which can take a long time. We would like to get this done faster.

One comfortable method to get a fast refreshable materialized view is the usage of the package DBMS_ADVISOR:

SQL> vari t varchar2(50)
SQL> begin
 2   dbms_advisor.tune_mview(task_name=>:t,
 3                           mv_create_stmt=>'create materialized view mv1'
 4                               || ' refresh fast as'
 5                               || ' select channel_id,sum(amount_sold)'
 6                               || ' from sales group by channel_id');
 7  end;
 8/  
PL/SQL procedure successfully completed.

SQL> set long 5000
SQL> select statement from user_tune_mview where task_name=:t order by action_id;

STATEMENT
--------------------------------------------------------------------------------
CREATE MATERIALIZED VIEW LOG ON "ADAM"."SALES" WITH ROWID, SEQUENCE ("CHANNEL_ID
","AMOUNT_SOLD")  INCLUDING NEW VALUES

ALTER MATERIALIZED VIEW LOG FORCE ON "ADAM"."SALES" ADD ROWID, SEQUENCE ("CHANNE
L_ID","AMOUNT_SOLD")  INCLUDING NEW VALUES

CREATE MATERIALIZED VIEW ADAM.MV1   REFRESH FAST WITH ROWID DISABLE QUERY REWRIT
E AS SELECT ADAM.SALES.CHANNEL_ID C1, SUM("ADAM"."SALES"."AMOUNT_SOLD") M1, COUN
T("ADAM"."SALES"."AMOUNT_SOLD") M2, COUNT(*) M3 FROM ADAM.SALES GROUP BY ADAM.SA
LES.CHANNEL_ID

STATEMENT
--------------------------------------------------------------------------------
DROP MATERIALIZED VIEW ADAM.MV1

SQL> exec dbms_advisor.delete_task(:t)
PL/SQL procedure successfully completed.

Usually, we need Materialized View Logs on the base tables and special expressions in the materialized view query to be able to do a fast refresh.

I simply take the coding from above now:

SQL> CREATE MATERIALIZED VIEW LOG ON "ADAM"."SALES"
WITH ROWID, SEQUENCE ("CHANNEL_ID","AMOUNT_SOLD")
INCLUDING NEW VALUES;  2    3

Materialized view log created.

SQL> CREATE MATERIALIZED VIEW ADAM.MV1
 REFRESH FAST WITH ROWID
 enable QUERY REWRITE AS
 SELECT ADAM.SALES.CHANNEL_ID C1,
 SUM("ADAM"."SALES"."AMOUNT_SOLD") M1,
 COUNT("ADAM"."SALES"."AMOUNT_SOLD") M2,
 COUNT(*) M3
 FROM ADAM.SALES
 GROUP BY ADAM.SALES.CHANNEL_ID;

Materialized view created.

After again modifying the base table sales, I will then try a fast refresh of the MV1:

SQL> set timing off
SQL> update sales set amount_sold=amount_sold*1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.

SQL> select mview_name,staleness from user_mviews;

MVIEW_NAME                     STALENESS
------------------------------ -------------------
MV1                            NEEDS_COMPILE

SQL> set timing on
SQL> exec dbms_mview.refresh('MV1','F')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.62

Materialized View, Query Rewritw, MV Log, Refresh FAST

It worked! But do I always have to refresh manually? No, we can also schedule a regular refresh like this:

SQL>  alter materialized view mv1 refresh start with sysdate next sysdate + interval '1' minute;

Materialized view altered.

SQL> alter session set nls_date_format='hh24:mi:ss';

Session altered.

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            FAST     10:40:05

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            FAST     10:41:04

Of course, a one minute refresh interval is extraordinary short. An other possibility to refresh the MVs very comfortably is on commit:

CREATE MATERIALIZED VIEW ADAM.MV1
 REFRESH FAST on commit 
 WITH ROWID
 enable QUERY REWRITE AS
 SELECT ADAM.SALES.CHANNEL_ID C1,
 SUM("ADAM"."SALES"."AMOUNT_SOLD") M1,
 COUNT("ADAM"."SALES"."AMOUNT_SOLD") M2,
 COUNT(*) M3
 FROM ADAM.SALES
 GROUP BY ADAM.SALES.CHANNEL_ID;

Materialized view created.

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            COMPLETE 11:16:28

SQL> update sales set amount_sold=amount_sold*1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            FAST     11:19:56

But this may (although done in asynchronous fashion) slow down the DML on the base table, so it requires testing whether the slow down is noticeable or neglectable. So far  for now about Materialized Views. There is of course much more to say, but I think that this is enough for a brief introduction of that theme.

Recommend this posting: In case you found this post useful, why not recommend it to your Twitter Followers? Look how easy I’ve made that for you: Just click here and a Twitter Status Box will open with the (fully customizable) text you see when you hold the cursor above the link. Thank you :)

, ,

101 Comments

Follow

Get every new post delivered to your Inbox.

Join 2,603 other followers

%d bloggers like this: