This user hasn't shared any biographical information
Posted in TOI on April 5, 2019
An Exasol database will automatically create, maintain and drop indexes, following the core idea to deliver great performance without requiring much administrative efforts. Like our tables, our indexes are always compressed and you don’t need to configure anything for that.
Joins between two or more tables are processed like this in Exasol: One table is full scanned (this is called the root table) and the other tables are joined using an index on their join columns.
If these indexes on the join columns are not already existing, they are automatically created during the join operation. Taking two tables t1 and t2 as an example, and a statement like
select count(*) from t1 join t2 on t1.joincol = t2.joincol;
The Exasol optimizer will compute an execution plan based on automatically gathered statistics that inform it amongst others about the table sizes. Often, the smaller table will become the root table and the other table will be joined using an index on the join column.
Let’s see an example:
create or replace table t1 as select rownum as joincol, 'Just some text' as textcol, mod(rownum,5) as numcol1, mod(rownum,1000) as numcol2, 5000 as numcol3, to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id from (select 1 from dual connect by level<=1e6); SUCCESS CREATE 36.43 1000000 OK create or replace table t2 as select * from t1 limit 5e5; SUCCESS CREATE 0.735 500000 OK select count(*) from t1 join t2 on t1.joincol=t2.joincol; COUNT(*) 500000 select index_table,remarks from exa_dba_indices where index_table like 'T_'; INDEX_TABLE REMARKS T1 GLOBAL INDEX (JOINCOL)
That index got created during query execution (on the first join between t1 and t2) and subsequently supports further joins with t1 on joincol.
If DML is done on t1, this index is maintained by the system:
INSERT into t1 will add new index entries accordingly.
DELETE from t1 will mark rows as deleted until more than 25% of rows have been deleted, Then t1 is reorganized automatically and the index is automatically rebuilt.
UPDATE statements that affect less than 15% of rows will update index key entries accordingly. If more than 15% of rows are updated, the index is automatically rebuilt.
If an index is not used to support queries for more than 5 weeks, it will be automatically dropped. That way, a useless index will not slow down DML and consume space for no good reason.
Operationally, nothing needs to be done about indexes in Exasol and that’s what most of our customers do: They just let the system take care of indexes. In earlier versions, EXA_DBA_INDICES didn’t exist even to avoid providing superfluous information.
What is a global index and why was it created that way on t1, you may ask. Like tables, indexes are also distributed across the Exasol cluster nodes. If the index part on a node points only to table rows residing on the same node, that’s a local index. A global index means that the index part on a node points to (at least some) table rows residing on another node. Means a global join leads to global indexes while local joins lead to local indexes. Profiling a join between t1 and t2 confirms that:
select part_name,part_info,object_name,remarks from exa_user_profile_last_day where stmt_id =21 and session_id=current_session; PART_NAME PART_INFO OBJECT_NAME REMARKS COMPILE / EXECUTE (null) (null) (null) SCAN (null) T2 (null) JOIN GLOBAL T1 T2(JOINCOL) => GLOBAL INDEX (JOINCOL) GROUP BY GLOBAL on TEMPORARY table tmp_subselect0 (null)
So that was a global join using a global index. If the two tables were distributed on joincol, this leads to local joins with local indexes. Putting the distribution key on joincol for t1 will automatically convert the existing index into a local index:
alter table t1 distribute by joincol; SUCCESS ALTER 0.90 alter table t2 distribute by joincol; SUCCESS ALTER 0.487 select index_table,remarks from exa_dba_indices where index_table like 'T_'; INDEX_TABLE REMARKS T1 LOCAL INDEX (JOINCOL)
When using prepared statements to do UPDATE or DELETE, this may automatically create an index on the filter column:
update t2 set textcol='Modified' where numcol1=:parameter1; -- Enter 1 SUCCESS UPDATE 0.44 100004 OK select index_table,remarks from exa_dba_indices where index_table like 'T_'; INDEX_TABLE REMARKS T1 LOCAL INDEX (JOINCOL) T2 LOCAL INDEX (NUMCOL1) delete from t1 where numcol2=:parameter2; -- Enter 42 SUCCESS DELETE 0.511 1000 OK select index_table,remarks from exa_dba_indices where index_table like 'T_'; INDEX_TABLE REMARKS T1 LOCAL INDEX (NUMCOL2) T1 LOCAL INDEX (JOINCOL) T2 LOCAL INDEX (NUMCOL1)
This results in local indexes because there is no need to refer to rows on other nodes while each node updates or deletes on his locally stored part of the tables.
I’m using DbVisualizer as a SQL Client for my demos here and it prompts for inputs of :parameter1 and :parameter2 when the statements are executed.
Another reason why indexes are automatically created is when primary or unique constraints are added to a table:
alter table t2 add constraint t2pk primary key (joincol); SUCCESS ALTER 0.39 select index_table,remarks from exa_dba_indices where index_table like 'T_'; INDEX_TABLE REMARKS T1 LOCAL INDEX (NUMCOL2) T1 LOCAL INDEX (JOINCOL) T2 LOCAL INDEX (JOINCOL) T2 LOCAL INDEX (NUMCOL1)
This created a local index because t2 is distributed on joincol.
Please do not take this as a recommendation to add primary or unique constraints to tables in Exasol, I just mentioned it because it also leads to the creation of indexes. MERGE statements also lead to index creation because they are processed as joins, by the way.
Although customers are not required to do anything about indexes, there’s a couple of good practices related with them that make your Exasol database perform even better:
Small numeric formats are optimal for join columns and lead to small efficient indexes. It’s often beneficial to replace multi-column joins respectively joins on large formatted columns by joins on numeric surrogate keys. IDENTITY COLUMNS help to generate these surrogate keys.
Avoid using expressions on the join columns because the resulting indexes are not persisted and have to be built again during every join:
select count(*) from t1 join t2 on t1.joincol+1=t2.joincol+1; -- don't do that
Avoid having mixed data types on join columns because that can also lead to expression indexes:
create or replace table t3 as select to_char(rownum) as joincol, -- don't do that 'Just some text' as textcol, mod(rownum,5) as numcol1, mod(rownum,1000) as numcol2, 5000 as numcol3, to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id from (select 1 from dual connect by level<=1e6); SUCCESS CREATE 34.925 1000000 OK create or replace table t4 as select * from t1 limit 100; SUCCESS CREATE 0.374 100 OK
Above created a large table t3 using a string type for joincol and a small table t4 using a numeric type for joincol. Upon joining the two, likely t4 becomes the root table and t3 is expression indexed:
select count(*) from t3 join t4 on t3.joincol=t4.joincol; COUNT(*) 100 select index_table,remarks from exa_dba_indices where index_table like 'T_'; INDEX_TABLE REMARKS T1 LOCAL INDEX (NUMCOL2) T1 LOCAL INDEX (JOINCOL) T2 LOCAL INDEX (JOINCOL) T2 LOCAL INDEX (NUMCOL1)
No index listed for t3 as you see. The profile of the join between t3 and t4 shows:
select part_name,part_info,object_name,remarks from exa_user_profile_last_day where stmt_id =95 and session_id=current_session; PART_NAME PART_INFO OBJECT_NAME REMARKS COMPILE / EXECUTE (null) (null) (null) INDEX CREATE EXPRESSION INDEX T3 ExpressionIndex SCAN on REPLICATED table T4 (null) JOIN (null) T3 T4(JOINCOL) => ExpressionIndex GROUP BY GLOBAL on TEMPORARY table tmp_subselect0 (null)
There was an index created for the join, but expression indexes are not persisted. This was a local join (no GLOBAL indicator in the PART_INFO column for the JOIN step) because t4 was replicated across all nodes due to its small size.
Apart from following the mentioned good practices, there’s simply not much for customers to take care of related to indexes in Exasol – it just works 🙂
Exasol creates all indexes it needs automatically.
Tables and Indexes are automatically compressed.
Tables and Indexes are automatically reorganized if required.
Useless indexes are automatically dropped.
The administrative effort related to indexes is close to zero in Exasol.
Investing some time in proper table design initially might still be a good idea.
Posted in TOI on March 8, 2019
After having worked with both databases for quite some time, this is what I consider to be the key differences between Oracle and Exasol. Of course the two have much in common: Both are relational databases with a transaction management system that supports the ACID model and both follow the ANSI SQL standard – both with some enhancements. Coming from Oracle as I do, much in Exasol looks quite familiar. But let’s focus on the differences:
Oracle is leading technology for Online Transaction Processing (OLTP). If you have a high data volume with many users doing concurrent changes, this is where Oracle shines particularly.
Exasol is leading technology for analytical workloads. If you want to do real-time ad hoc reporting on high data volume, this is where Exasol shines particularly.
Data Format & In-Memory processing
Oracle uses a row-oriented data format, which is well suited for OLTP but not so much for analytical workloads. That’s why Hybrid Columnar Compression (only available on Engineered Systems respectively on Oracle proprietary storage) and the In-Memory Column Store (extra charged option) have been added in recent years.
Exasol uses natively a compressed columnar data format and processes this format in memory. That is very good for analytical queries but bad for OLTP because one session that does DML on a table locks that table against DML from other sessions. Read Consistent SELECT is possible for these other sessions, though.
Oracle was designed for OLTP at times when memory was scarce and expensive. Exasol was designed to process analytical workloads in memory.
Oracle started as a non-clustered (single instance) system. Real Application Clusters (RAC) have been added much later. The majority of Oracle installations is still non-clustered. RAC (extra charged option) is rather an exception than the rule. Most RAC installations are 2-node clusters with availability as the prime reason, scalability being rather a side aspect.
Exasol was designed from the start to run on clustered commodity Intel servers. Prime reasons were MPP performance and scalability with availability being rather a side aspect.
This doesn’t matter for most Oracle installations, only for RAC. Here, Oracle uses a shared disk architecture while Exasol uses a shared nothing architecture, which is optimal for performance because every Exasol cluster node can operate on a different part of the data in parallel. Drawback is that after adding nodes to an Exasol cluster, the data has to be re-distributed.
With Exadata, Oracle tries to compensate the performance disadvantage of the shared disk architecture by enabling the storage servers to filter data locally for analytical workloads. This approach leads to better performance than Oracle can deliver on other (non-proprietary) platforms.
Availability & Recoverability
Clearly, Oracle is better in this area. A non-clustered Oracle database running in archive log mode will enable you to recover every single committed transaction you did since you took the last backup. With Exasol, you can only restore the last backup and all changes since then are lost. You can safeguard an Oracle database against site failure with a standby database at large distance without performance impact. Exasol doesn’t have that. With RAC, you can protect an Oracle database against node failure. The database stays up (the Global Resource Directory is frozen for a couple of seconds, though) upon node failure with no data loss.
If an Exasol cluster node fails, this leads to a database restart. Means no availability for a couple of seconds and all sessions get disconnected. But also no data loss. Optionally, Exasol can be configured as Synchronous Dual Data Center – similar to Oracle’s Extended RAC.
Complexity & Manageability
I realized that there’s a big difference between Exasol and Oracle in this area when I was teaching an Exasol Admin class recently: Some seasoned Oracle DBAs in the audience kept asking questions like “We can do this and that in Oracle, how does that work with Exasol?” (e.g. creating Materialized Views or Bitmap Indexes or an extra Keep Cache) and my answer was always like “We don’t need that with Exasol to get good performance”.
Let’s face it, an Oracle database is probably one of the most complex commercial software products ever developed. You need years of experience to administer an Oracle database with confidence. See this recent Oracle Database Administration manual to get an impression. It has 1690 pages! And that’s not yet Real Application Clusters, which is additionally 492 pages. Over 2100 pages of documentation to dig through, and after having worked with Oracle for over 20 years, I can proudly say that I actually know most of it.
In comparison, Exasol is very easy to use and to manage, because the system takes care of itself largely. Which is why our Admin class can have a duration of only two days and attendees feel empowered to manage Exasol afterwards.
That was intentionally so from the start: Exasol customers are not supposed to study the database for years (or pay someone who did) in order to get great performance. Oracle realized that being complex and difficult to manage is an obstacle and came out with the Autonomous Database – but that is only available in the proprietary Oracle Cloud.
Using comparable hardware and processing the same (analytical) workload, Exasol outperforms any competitor. That includes Oracle on Exadata. Our Presales consultants regard Exadata as a sitting duck, waiting to get shot on a POC. I was personally shocked to learn that, after drinking the Oracle Kool-Aid myself for years.
In my opinion, these two points are most important: Exasol is faster and at the same time much easier to manage! I mean anything useless could be easy to manage, so that’s not an asset on its own. But together with delivering striking performance, that’s really a big deal.
This is and has always been a painpoint for Oracle customers: The licensing of an Oracle database is so complex and fine granular that you always wonder “Am I allowed to do this without violating my license? Do we really need these features that we paid for? Are we safe if Oracle does a License Audit?” With Exasol, all features are always included and the two most popular license types are totally easy to understand: You pay either for the data volume loaded into the cluster or for the amount of memory assigned to the database. No sleepless nights because of that!
This topic becomes increasingly important as many of our new customers want to deploy Exasol in the cloud. And you may have noticed that Oracle pushes going cloud seriously over the last years.
Exasol runs with all features enabled in the cloud: You can choose between Amazon Web Services, (AWS), Microsoft Azure and ExaCloud
This is presently the most popular way our customers run Exasol in the cloud. See here for more details.
Microsoft’s cloud can also be used to run Exasol, which gives you the option to choose between two major public cloud platforms. See here for more details.
Hosted and managed by Exasol, ExaCloud is a full database-as-a-service offering. See here for more details.
Hybrid Exasol deployments that combine cloud with on-prem can also be used, just depending on customer requirements.
Oracle offers RAC only on the Oracle Cloud platform, not on public clouds. Various other features are also restricted to be available only in Oracle’s own cloud. The licensing model has been tweaked to favor the usage of Oracle’s own cloud over other public clouds generally.
Customers love Exasol, as the recent Dresner report confirms. We get a perfect recommendation score. I can also tell that from personal encounters: Literally every customer I met is pleased with our product and our services!
Oracle is great for OLTP and okay for analytical workloads – especially if you pay extra for things like Partitioning, RAC, In-Memory Column Store and Exadata. Then the performance you get for your analytical workload might suit your present demand.
Exasol is totally bad for OLTP but best in the world for analytical workloads. Do you think your data volume and your analytic demands will grow?
Posted in TOI on February 1, 2019
The technique to recover only certain objects from an ordinary backup is called Virtual Access Restore. Means you create a database from backup that contains only the minimum elements needed to access the objects you request. This database is then removed afterwards.
Let’s see an example. This is my initial setup:
One database in a 2+1 cluster. Yes it’s tiny because it lives on my notebook in VirtualBox. See here how you can get that too.
It uses the data volume v0000 and I took a backup into the archive volume v0002 already.
I have a schema named RETAIL there with the table SALES:
By mistake, that table gets dropped:
And I’m on AUTOCOMMIT, otherwise this could be rolled back in Exasol. Virtual Access Restore to the rescue!
First I need another data volume:
Notice the size of the new volume: It is smaller than the overall size of the backup respectively the size of the “production database”! I did that to prove that space is not much of a concern here.
Then I add a second database to the cluster that uses that volume. The connection port (8564) must be different from the port used by the first database and the DB RAM in total must not exceed the licensed size, which is limited to 4 GB RAM in my case:
I did not start that database because for the restore procedure it has to be down anyway. Clicking on the DB Name and then on the Backups button gets me here:
No backup shown yet because I didn’t take any backups with exa_db2. Clicking on Show foreign database backups:
The Expiration date must be empty for a Virtual Access Restore, so I just remove it and click Apply. Then I select the Restore Type as Virtual Access and click Restore:
This will automatically start the second database:
I connect to exa_db2 with EXAplus, where the Schema Browser gives me the DDL for the table SALES:
I take that to exa_db1 and run it there, which gives me the table back but empty. Next I create a connection from exa_db1 to exa_db2 and import the table
create connection exa_db2 to '192.168.43.11..13:8564' user 'sys' identified by 'exasol'; import into retail.sales from exa at exa_db2 table retail.sales;
This took about 2 Minutes:
The second database and then the second data volume can now be dropped. Problem solved!