Beiträge getaggt mit Data Warehouse
Partitioning a table online with DBMS_REDEFINITION
If there is a requirement to change the structure of a table that is already in use productively, it may be impossible to get a maintenance downtime for that table, because it is constantly in use. That can be the case for all kind of structural changes of a table, particularly for the change from an ordinary heap table into a partitioned table, which I am going to take here as an example, because I am getting asked frequently in my courses how to achieve it. In order to demonstrate that, I will create a demonstration user with a non-partitioned table with privileges and additional dependent objects on it:
SQL> grant dba to adam identified by adam; Grant succeeded. SQL> connect adam/adam Connected. SQL> create table original as select rownum as id, mod(rownum,5) as channel_id, 5000 as amount_sold, mod (rownum,1000) as cust_id, sysdate as time_id from dual connect by level<=1e6; Table created. SQL> create index original_id_idx on original(id) nologging; Index created. SQL> grant select on original to hr; Grant succeeded.
The challenge is now to change this table into a partitioned one while it is used with DML & queries by end users. For this purpose, we introduced already in 9i (if I recall it right) the package DBMS_REDEFINITION. First step would be to ask, whether it can be used in this case:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for Linux: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production
SQL> begin dbms_redefinition.can_redef_table (uname=>'ADAM', tname=>'ORIGINAL', options_flag=>DBMS_REDEFINITION.CONS_USE_ROWID); end; /
PL/SQL procedure successfully completed.
Because there is no Primary Key on the original table, I have to use CONS_USE_ROWID, else I could use CONS_USE_PK. There are no objections against the online redefinition of the table here – else an error message would appear. Next step is to create an interim table of the structure, desired for the original table. In my case, I create it interval partitioned (an 11g New Feature). I could also change storage attributes and add or remove columns during that process.
SQL> create table interim (id number, channel_id number(1), amount_sold number(4), cust_id number(4), time_id date) partition by range (cust_id) interval (10) (partition p1 values less than (10)); Table created.
My original table has 1000 distinct cust_ids, so this will lead to 100 partitions – each partion will contain 10 distinct cust_ids. One benefit of that would be the possibility of partition pruning, should there be statements, specifying the cust_id in the where-condition. These statements will be about 100 times faster as a full table scan. The next step will basically insert all the rows from the original table into the interim table (thereby automatically generating 99 partitions), while DML during that period is recorded:
SQL> set timing on SQL> BEGIN DBMS_REDEFINITION.START_REDEF_TABLE (uname=>'ADAM', orig_table=>'ORIGINAL', int_table=>'INTERIM', options_flag=>DBMS_REDEFINITION.CONS_USE_ROWID); end; / PL/SQL procedure successfully completed. Elapsed: 00:00:22.76
If this step takes a long time to run it might be beneficial to use the SYNC_INTERIM_TABLE procedure occasionally from another session. That prevents a longer locking time for the last step, the calling of FINISH_REDEF_TABLE. Next step is now to add the dependent objects/privileges to the interim table:
SQL> set timing off
SQL> vari num_errors number
BEGIN
DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS
(uname=>'ADAM',
orig_table=>'ORIGINAL',
int_table=>'INTERIM',
num_errors=>:num_errors);
END;
/
PL/SQL procedure successfully completed.
SQL> print num_errors
NUM_ERRORS
----------
0
There was no problem with this step. Until now the original table is still an ordinary heap table – only the interim table is partitioned:
SQL> select table_name from user_part_tables; TABLE_NAME ------------------------------ INTERIM
In the last step, the two tables change their names and the recorded DML that occured in the meantime gets used for actualization:
SQL> begin dbms_redefinition.finish_redef_table (uname=>'ADAM', orig_table=>'ORIGINAL', int_table=>'INTERIM'); end; / PL/SQL procedure successfully completed.
We will now determine that the original table is partitioned and the dependencies are still there:
SQL> select table_name,partitioning_type from user_part_tables;
TABLE_NAME PARTITION
------------------------------ ---------
ORIGINAL RANGE
SQL> select count(*) from user_tab_partitions;
COUNT(*)
----------
100
SQL> select grantee,privilege from user_tab_privs_made where table_name='ORIGINAL';
GRANTEE PRIVILEGE
------------------------------ ----------------------------------------
HR SELECT
SQL> select index_name,table_name from user_indexes;
INDEX_NAME TABLE_NAME
------------------------------ ------------------------------
ORIGINAL_ID_IDX ORIGINAL
TMP$$_ORIGINAL_ID_IDX0 INTERIM
The interim table can now be dropped. We changed the table into a partitioned table without any end user noticing it!
This picture illustrates the steps you have seen above – hope you find it useful 🙂
Conclusion: If a table structure needs to be modified and the table is permanently accessed by end users, this can be done with some effort using DBMS_REFDEFINITION. One common – but not the only possible – use case is the modification of a non-partitioned table into a partitioned one. You have seen a simplified demonstration about it. As always: Don’t believe it, test it! 🙂
Popular on https://t.co/jpLJM0RkaU: Partition a table with DBMS_REDEFINITION https://t.co/rOMDau6xW1 #Oracle pic.twitter.com/jIof4fwF8x
— Uwe Hesse (@UweHesse) January 29, 2016
Result Cache: Another brilliant 11g New Feature
I have just finished an Oracle Database 11g New Features course in Vienna. There are many amazing New Features in the 11g version, one of them is the possibility to cache the result sets of statements, that access large tables but return relatively few rows. Think of it like automagically created materialized views inside the SGA. Example:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for Linux: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production
SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES'; MB ---------- 560
SQL> alter table sales result_cache (mode force);
Table altered.
The ALTER TABLE statement is a new feature of 11g Release 2. In Release 1, you can control the feature only with the following parameters:
SQL> show parameter result_cache_m
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
result_cache_max_result integer 5
result_cache_max_size big integer 2080K
result_cache_mode string MANUAL
Another possibility, already introduced in 11g R1 is the RESULT_CACHE hint. Now let’s look at the effect of the ALTER TABLE statement:
SQL> set timing on
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
2 421541477
4 219308832
3 926004170
9 4438820.16
Elapsed: 00:00:03.72
This was the first time, the sales table was accessed after the ALTER TABLE above. The runtime signalizes we have got a full table scan here (there are no indexes on the table anyway). Of course, blocks of the table are now cached in the database buffer cache – as in previous versions. But now, also the result set is cached!
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
2 421541477
4 219308832
3 926004170
9 4438820.16
Elapsed: 00:00:00.01
That is obvious by runtime already, so I omit AUTOTRACE here. If the table gets changed, the result set gets „stale“, similar like a materialized view would:
SQL> update sales set amount_sold=1 where rownum<2;
1 row updated.
Elapsed: 00:00:00.02
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
2 421541477
4 219308832
3 926002938
9 4438820.16
Elapsed: 00:00:03.08
Second access after the DML will again use the (newly cached) result set – even if the statement is slightly different, in the same way as materialized views can be used for query rewrite, even if the SELECT differs from the query that built the materialized view. That gives me the opportunity to introduce the new SUPERFAST hint 🙂
SQL> select /*+ superfast */ channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
2 421541477
4 219308832
3 926002938
9 4438820.16
Elapsed: 00:00:00.00
The „hint“ is just a strange commentar for the engine and is simply ignored, of course. The speed comes from using the cached result set, stored in the SGA by the previous SELECT with the 3 seconds runtime.
Addendum: See these two fine postings by Alex Fatkulin that talk about the 11gR2 improvements regarding the latch contention issue that surfaced with the Result Cache feature in 11gR1:
11GR2 Result Cache Scalability
Result Cache Latch in 11GR2: Shared Mode Gets
Thanks for the comments below that mentioned possible drawbacks of Result Caching! Much appreciated 🙂
Automatic DOP in 11gR2
We have a probably very needful new feature introduced in 11g Release 2, related to parallel query: Automatically determined Degree of Parallelism (DOP). In earlier versions of the Oracle Database, we had to determine the DOP more or less manually, either with a parallel hint or by setting a parallel degree with alter table:
select /*+ parallel (sales,2) */ * from sales;
or
alter table sales parallel 2;
There was an automatic computation of the DOP available, derived from the simple formula CPU_COUNT * PARALLEL_THREADS_PER_CPU. That is what’s done internally if we would have said
select /*+ parallel (sales) */ * from sales;
or
alter table sales parallel;
The drawback with these approaches was always, that we could hardly be sure, whether the DOP is appropriate or not for the table, the statement and the hardware, we are running on. It was mostly a case of try & error. Especially problematic was the alter table approach, as this leads to the parallelization of each and every following select on those tables, even if totally inappropriate. A popular pitfall is the creation of tables with a parallel clause on OLTP-systems, because those tables inherit the parallel degree of their creation, which leads to parallel query for every statement accessing the tables afterwards – most likely not desirable for OLTP. In so far, parallelization was quite dumb (from a system-internal perspective) in versions before 11gR2. Now to the new feature:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES';
MB
----------
563
SQL> select degree from user_tables where table_name='SALES';
DEGREE
----------------------------------------
1
For my tiny machine, this table is huge. I allow automatic determination of the DOP with the following new dynamic parameter:
SQL> alter session set parallel_degree_policy=auto; Session altered. SQL> set autotrace on explain SQL> select sum(amount_sold) from sales SUM(AMOUNT_SOLD) ---------------- 1571293299 Execution Plan ---------------------------------------------------------- Plan hash value: 3130505568 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 10798 (1)| 00:02:10 | | 1 | SORT AGGREGATE | | 1 | 5 | | | | 2 | PX COORDINATOR | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | | 4 | SORT AGGREGATE | | 1 | 5 | | | | 5 | PX BLOCK ITERATOR | | 14M| 70M| 10798 (1)| 00:02:10 | | 6 | TABLE ACCESS FULL| SALES | 14M| 70M| 10798 (1)| 00:02:10 | ----------------------------------------------------------------------------------- Note ----- - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit
I have got a moderate DOP, most likely appropriate for my relatively weak hardware, but still speeding up the query on the relatively big table. I could always override the automatic DOP determination by specifying a parallel hint as in earlier versions. [See in my newer posting why the sentence here was crossed] Also, the parameter defaults to manual, so unless we change it, automatic parallelization will not take place. In order to demonstrate the quite intelligent computation of the DOP, compared to earlier versions, i will access another, much smaller table in the same session:
SQL> select count(*) from customers; COUNT(*) ---------- 30501 Execution Plan ---------------------------------------------------------- Plan hash value: 296924608 ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 218 (0)| 00:00:03 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| CUSTOMERS | 30501 | 218 (0)| 00:00:03 | ------------------------------------------------------------------------ Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
As you can see, in this case, the system does not think that parallelization is appropriate for the select. So it is much smarter than the old force parallel query:
SQL> alter session set parallel_degree_policy=manual;
Session altered.
SQL> alter session force parallel query;
Session altered.
SQL> select count(*) from customers;
COUNT(*)
----------
30501
Execution Plan
----------------------------------------------------------
Plan hash value: 1221513835
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 121 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | PX COORDINATOR | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | |
| 4 | SORT AGGREGATE | | 1 | | |
| 5 | PX BLOCK ITERATOR | | 30501 | 121 (0)| 00:00:02 |
| 6 | TABLE ACCESS FULL| CUSTOMERS | 30501 | 121 (0)| 00:00:02 |
----------------------------------------------------------------------------
Also, we have a remedy now against inappropriate parallel degrees on tables:
SQL> alter session enable parallel query -- the default, no force;
Session altered.
SQL> alter table customers parallel -- would cause parallel query before 11gR2;
Table altered.
SQL> set autotrace on explain
SQL> select count(*) from customers;
COUNT(*)
----------
30501
Execution Plan
----------------------------------------------------------
Plan hash value: 296924608
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 218 (0)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 30501 | 218 (0)| 00:00:03 |
------------------------------------------------------------------------
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
