Free online courses to learn about #Exasol

Why should you bother? Because Exasol is the fastest analytical database in the world, outperforming any competitor. Therefore, expertise about Exasol might soon be very valuable also in your company.
Free training helps us to spread the knowledge in a scalable way, empowering customers across the globe to get the best out of Exasol and supporting our rapid growth.

You can register here. The free online courses are branded as “Exacademy”:

The Database Administration course there is my baby 🙂 You will create your own virtualized Exasol Cluster on your local machines if you go through it. I know having your personal sandbox can be very instructive because that’s what I was using myself to better understand Exasol clusters before developing that course.

When I create course material, I always ask myself: What would I want to get from this course if I attended it? And that’s what I try to put in: To the point, no nonsense, just stuff that works and helps to understand. Hope you find it useful 🙂

 

Leave a comment

Automatic Indexes in #Exasol

An Exasol database will automatically create, maintain and drop indexes, following the core idea to deliver great performance without requiring much administrative efforts. Like our tables, our indexes are always compressed and you don’t need to configure anything for that.

Joins between two or more tables are processed like this in Exasol: One table is full scanned (this is called the root table) and the other tables are joined using an index on their join columns.

If these indexes on the join columns are not already existing, they are automatically created during the join operation. Taking two tables t1 and t2 as an example, and a statement like

select count(*) from t1 join t2 on t1.joincol = t2.joincol;

The Exasol optimizer will compute an execution plan based on automatically gathered statistics that inform it amongst others about the table sizes. Often, the smaller table will become the root table and the other table will be joined using an index on the join column.
Let’s see an example:

create or replace table t1 as
select 
rownum as joincol, 
'Just some text' as textcol,
mod(rownum,5) as numcol1, 
mod(rownum,1000) as numcol2,
5000 as numcol3,
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id 
from (select 1 from dual connect by level<=1e6);

SUCCESS	CREATE	36.43		1000000	OK

create or replace table t2 as select * from t1 limit 5e5;

SUCCESS	CREATE	0.735		500000	OK

select count(*) from t1 join t2 on t1.joincol=t2.joincol;

COUNT(*)
500000

select index_table,remarks from exa_dba_indices where index_table like 'T_';

INDEX_TABLE	REMARKS
T1	        GLOBAL INDEX (JOINCOL)

That index got created during query execution (on the first join between t1 and t2) and subsequently supports further joins with t1 on joincol.

If DML is done on t1, this index is maintained by the system:
INSERT into t1 will add new index entries accordingly.
DELETE from t1 will mark rows as deleted until more than 25% of rows have been deleted, Then t1 is reorganized automatically and the index is automatically rebuilt.
UPDATE statements that affect less than 15% of rows will update index key entries accordingly. If more than 15% of rows are updated, the index is automatically rebuilt.

If  an index is not used to support queries for more than 5 weeks, it will be automatically dropped. That way, a useless index will not slow down DML and consume space for no good reason.

Operationally, nothing needs to be done about indexes in Exasol and that’s what most of our customers do: They just let the system take care of indexes. In earlier versions, EXA_DBA_INDICES didn’t exist even to avoid providing superfluous information.

What is a global index and why was it created that way on t1, you may ask. Like tables, indexes are also distributed across the Exasol cluster nodes. If the index part on a node points only to table rows residing on the same node, that’s a local index. A global index means that the index part on a node points to (at least some) table rows residing on another node. Means a global join leads to global indexes while local joins lead to local indexes. Profiling a join between t1 and t2 confirms that:

select part_name,part_info,object_name,remarks from exa_user_profile_last_day where stmt_id =21 and session_id=current_session;

PART_NAME	    PART_INFO	              OBJECT_NAME	   REMARKS
COMPILE / EXECUTE   (null)	              (null)	           (null)
SCAN	            (null)	               T2	           (null)
JOIN	            GLOBAL	               T1	           T2(JOINCOL) => GLOBAL INDEX (JOINCOL)
GROUP BY	    GLOBAL on TEMPORARY table  tmp_subselect0	   (null)

So that was a global join using a global index. If the two tables were distributed on joincol, this leads to local joins with local indexes. Putting the distribution key on joincol for t1 will automatically convert the existing index into a local index:

alter table t1 distribute by joincol;

SUCCESS	ALTER	0.90

alter table t2 distribute by joincol;

SUCCESS	ALTER	0.487

select index_table,remarks from exa_dba_indices where index_table like 'T_';

INDEX_TABLE	REMARKS
T1	        LOCAL INDEX (JOINCOL)

When using prepared statements to do UPDATE or DELETE, this may automatically create an index on the filter column:

update t2 set textcol='Modified' where numcol1=:parameter1; -- Enter 1

SUCCESS	UPDATE	0.44		100004	OK

select index_table,remarks from exa_dba_indices where index_table like 'T_';

INDEX_TABLE	REMARKS
T1	        LOCAL INDEX (JOINCOL)
T2	        LOCAL INDEX (NUMCOL1)

delete from t1 where numcol2=:parameter2; -- Enter 42

SUCCESS	DELETE	0.511		1000	OK

select index_table,remarks from exa_dba_indices where index_table like 'T_';

INDEX_TABLE	REMARKS
T1	        LOCAL INDEX (NUMCOL2)
T1	        LOCAL INDEX (JOINCOL)
T2	        LOCAL INDEX (NUMCOL1)

This results in local indexes because there is no need to refer to rows on other nodes while each node updates or deletes on his locally stored part of the tables.
I’m using DbVisualizer as a SQL Client for my demos here and it prompts for inputs of :parameter1 and :parameter2 when the statements are executed.

Another reason why indexes are automatically created is when primary or unique constraints are added to a table:

alter table t2 add constraint t2pk primary key (joincol);

SUCCESS	ALTER	0.39

select index_table,remarks from exa_dba_indices where index_table like 'T_';

INDEX_TABLE	REMARKS
T1	        LOCAL INDEX (NUMCOL2)
T1	        LOCAL INDEX (JOINCOL)
T2	        LOCAL INDEX (JOINCOL)
T2	        LOCAL INDEX (NUMCOL1)

This created a local index because t2 is distributed on joincol.
Please do not take this as a recommendation to add primary or unique constraints to tables in Exasol, I just mentioned it because it also leads to the creation of indexes. MERGE statements also lead to index creation because they are processed as joins, by the way.

Although customers are not required to do anything about indexes, there’s a couple of good practices related with them that make your Exasol database perform even better:

Small numeric formats are optimal for join columns and lead to small efficient indexes. It’s often beneficial to replace multi-column joins respectively joins on large formatted columns by joins on numeric surrogate keys. IDENTITY COLUMNS help to generate these surrogate keys.

Avoid using expressions on the join columns because the resulting indexes are not persisted and have to be built again during every join:

select count(*) from t1 join t2 on t1.joincol+1=t2.joincol+1; -- don't do that

Avoid having mixed data types on join columns because that can also lead to expression indexes:

create or replace table t3 as
select 
to_char(rownum) as joincol, -- don't do that
'Just some text' as textcol,
mod(rownum,5) as numcol1, 
mod(rownum,1000) as numcol2,
5000 as numcol3,
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id 
from (select 1 from dual connect by level<=1e6);

SUCCESS	CREATE	34.925		1000000	OK

create or replace table t4 as select * from t1 limit 100;

SUCCESS	CREATE	0.374		100	OK

Above created a large table t3 using a string type for joincol and a small table t4 using a numeric type for joincol. Upon joining the two, likely t4 becomes the root table and t3 is expression indexed:

select count(*) from t3 join t4 on t3.joincol=t4.joincol;

COUNT(*)
100

select index_table,remarks from exa_dba_indices where index_table like 'T_';

INDEX_TABLE	REMARKS
T1	        LOCAL INDEX (NUMCOL2)
T1	        LOCAL INDEX (JOINCOL)
T2	        LOCAL INDEX (JOINCOL)
T2	        LOCAL INDEX (NUMCOL1)

No index listed for t3 as you see. The profile of the join between t3 and t4 shows:

select part_name,part_info,object_name,remarks from exa_user_profile_last_day where stmt_id =95 and session_id=current_session;

PART_NAME	    PART_INFO	               OBJECT_NAME	REMARKS
COMPILE / EXECUTE   (null)	               (null)	        (null)
INDEX CREATE	    EXPRESSION INDEX	        T3	        ExpressionIndex
SCAN	            on REPLICATED table	        T4	        (null)
JOIN	            (null)	                T3	        T4(JOINCOL) => ExpressionIndex
GROUP BY	    GLOBAL on TEMPORARY table	tmp_subselect0	(null)

There was an index created for the join, but expression indexes are not persisted. This was a local join (no GLOBAL indicator in the PART_INFO column for the JOIN step) because t4 was replicated across all nodes due to its small size.

Apart from following the mentioned good practices, there’s simply not much for customers to take care of related to indexes in Exasol – it just works 🙂

Executive Summary:

Exasol creates all indexes it needs automatically.

Tables and Indexes are automatically compressed.

Tables and Indexes are automatically reorganized if required.

Useless indexes are automatically dropped.

The administrative effort related to indexes is close to zero in Exasol.

Investing some time in proper table design initially might still be a good idea.

 

Leave a comment

Comparison between #Oracle and #Exasol

After having worked with both databases for quite some time, this is what I consider to be the key differences between Oracle and Exasol. Of course the two have much in common: Both are relational databases with a transaction management system that supports the ACID model and both follow the ANSI SQL standard – both with some enhancements. Coming from Oracle as I do, much in Exasol looks quite familiar. But let’s focus on the differences:

Strengths

Oracle is leading technology for Online Transaction Processing (OLTP). If you have a high data volume with many users doing concurrent changes, this is where Oracle shines particularly.

Exasol is leading technology for analytical workloads. If you want to do real-time ad hoc reporting on high data volume, this is where Exasol shines particularly.

Architecture

Data Format & In-Memory processing

Oracle uses a row-oriented data format, which is well suited for OLTP but not so much for analytical workloads. That’s why Hybrid Columnar Compression (only available on Engineered Systems respectively on Oracle proprietary storage) and the In-Memory Column Store (extra charged option) have been added in recent years.

Exasol uses natively a compressed columnar data format and processes this format in memory. That is very good for analytical queries but bad for OLTP because one session that does DML on a table locks that table against DML from other sessions. Read Consistent SELECT is possible for these other sessions, though.

Oracle was designed for OLTP at times when memory was scarce and expensive. Exasol was designed to process analytical workloads in memory.

Clustering

Oracle started as a non-clustered (single instance) system. Real Application Clusters (RAC) have been added much later. The majority of Oracle installations is still non-clustered. RAC (extra charged option) is rather an exception than the rule. Most RAC installations are 2-node clusters with availability as the prime reason, scalability being rather a side aspect.

Exasol was designed from the start to run on clustered commodity Intel servers. Prime reasons were MPP performance and scalability with availability being rather a side aspect.

Data Distribution

This doesn’t matter for most Oracle installations, only for RAC. Here, Oracle uses a shared disk architecture while Exasol uses a shared nothing architecture, which is optimal for performance because every Exasol cluster node can operate on a different part of the data in parallel. Drawback is that after adding nodes to an Exasol cluster, the data has to be re-distributed.

With Exadata, Oracle tries to compensate the performance disadvantage of the shared disk architecture by enabling the storage servers to filter data locally for analytical workloads. This approach leads to better performance than Oracle can deliver on other (non-proprietary) platforms.

Availability & Recoverability

Clearly, Oracle is better in this area. A non-clustered Oracle database running in archive log mode will enable you to recover every single committed transaction you did since you took the last backup. With Exasol, you can only restore the last backup and all changes since then are lost. You can safeguard an Oracle database against site failure with a standby database at large distance without performance impact. Exasol doesn’t have that. With RAC, you can protect an Oracle database against node failure. The database stays up (the Global Resource Directory is frozen for a couple of seconds, though) upon node failure with no data loss.

If an Exasol cluster node fails, this leads to a database restart. Means no availability for a couple of seconds and all sessions get disconnected. But also no data loss. Optionally, Exasol can be configured as Synchronous Dual Data Center – similar to Oracle’s Extended RAC.

Complexity & Manageability

I realized that there’s a big difference between Exasol and Oracle in this area when I was teaching an Exasol Admin class recently: Some seasoned Oracle DBAs in the audience kept asking questions like “We can do this and that in Oracle, how does that work with Exasol?” (e.g. creating Materialized Views or Bitmap Indexes or an extra Keep Cache) and my answer was always like “We don’t need that with Exasol to get good performance”.

Let’s face it, an Oracle database is probably one of the most complex commercial software products ever developed. You need years of experience to administer an Oracle database with confidence. See this recent Oracle Database Administration manual to get an impression. It has 1690 pages! And that’s not yet Real Application Clusters, which is additionally 492 pages. Over 2100 pages of documentation to dig through, and after having worked with Oracle for over 20 years, I can proudly say that I actually know most of it.

In comparison, Exasol is very easy to use and to manage, because the system takes care of itself largely. Which is why our Admin class can have a duration of only two days and attendees feel empowered to manage Exasol afterwards.

That was intentionally so from the start: Exasol customers are not supposed to study the database for years (or pay someone who did) in order to get great performance. Oracle realized that being complex and difficult to manage is an obstacle and came out with the Autonomous Database – but that is only available in the proprietary Oracle Cloud.

Performance

Using comparable hardware and processing the same (analytical) workload, Exasol outperforms any competitor. That includes Oracle on Exadata. Our Presales consultants regard Exadata as a sitting duck, waiting to get shot on a POC. I was personally shocked to learn that, after drinking the Oracle Kool-Aid myself for years.

In my opinion, these two points are most important: Exasol is faster and at the same time much easier to manage! I mean anything useless could be easy to manage, so that’s not an asset on its own. But together with delivering striking performance, that’s really a big deal.

Licensing

This is and has always been a painpoint for Oracle customers: The licensing of an Oracle database is so complex and fine granular that you always wonder “Am I allowed to do this without violating my license? Do we really need these features that we paid for? Are we safe if Oracle does a License Audit?” With Exasol, all features are always included and the two most popular license types are totally easy to understand: You pay either for the data volume loaded into the cluster or for the amount of memory assigned to the database. No sleepless nights because of that!

Cloud

This topic becomes increasingly important as many of our new customers want to deploy Exasol in the cloud. And you may have noticed that Oracle pushes going cloud seriously over the last years.

Exasol runs with all features enabled in the cloud: You can choose between Amazon Web Services, (AWS), Microsoft Azure and ExaCloud

AWS

This is presently the most popular way our customers run Exasol in the cloud. See here for more details.

MS Azure

Microsoft’s cloud can also be used to run Exasol, which gives you the option to choose between two major public cloud platforms. See here for more details.

ExaCloud

Hosted and managed by Exasol, ExaCloud is a full database-as-a-service offering. See here for more details.

Hybrid Exasol deployments that combine cloud with on-prem can also be used, just depending on customer requirements.

Oracle offers RAC only on the Oracle Cloud platform, not on public clouds. Various other features are also restricted to be available only in Oracle’s own cloud. The licensing model has been tweaked to favor the usage of Oracle’s own cloud over other public clouds generally.

Customer Experience

Customers love Exasol, as the recent Dresner report confirms. We get a perfect recommendation score. I can also tell that from personal encounters: Literally every customer I met is pleased with our product and our services!

Conclusion

Oracle is great for OLTP and okay for analytical workloads – especially if you pay extra for things like Partitioning, RAC, In-Memory Column Store and Exadata. Then the performance you get for your analytical workload might suit your present demand.

Exasol is totally bad for OLTP but best in the world for analytical workloads. Do you think your data volume and your analytic demands will grow?

Leave a comment

%d bloggers like this: