Archive for category TOI

#DOAG2018 Impressions

As every year in November, the database circus moved to Nuremberg on behalf of the annual DOAG conference. As you may know, this conference has very high standards in terms of turnout and top-notch speakers and it reached them once again: It was a great event!

It started with the welcome keynote where Stefan Kinnen told us that the conference attracted over 2000 attendees and more than 400 speakers from all over the world. That should make it the largest database conference in Europe, I suppose!

DOAG welcome keynote (Stefan Kinnen speaking)

I went on to attend Tirthankar Lahiri who presented about the In-Memory column store:

Tirthankar Lahiri preparing to speak

To me, the In-Memory expressions and In-Memory external tables sounded particular useful here.

Next talk was done by Martin Widlake about Performance Tuning:

Martin Widlake walking on stage

I liked his statement that in order to solve performance problems, the key skill you need is not technical in the first place, but more of a social nature: You need to thoroughly understand the problem and listen carefully. Secondly, some technical expertise is helpful, of course 🙂

Afterwards I did my own talk about Flashback in the same room to approximately 150 attendees – some less than Martin had, though:

The talk was well received, I finished my live demos in time while being able to answer some questions also – all good 🙂

Then came The Basics of Machine Learning by Heli Helskyaho (the pic I took is too bad to put it here, unfortunately) whose talk was so successful that she had to repeat it on the next day. Great job!

I went on attending Markus Michalewicz presenting about The Smartest HA Features in Oracle’s Autonomous Database:

Markus Michalewicz presenting

Here I liked the new concept of “Recovery Buddies” in RAC that leads to a shorter freeze time of the Global Resource Directory if one instance fails in particular.

Then came a talk How to run a user group delivered by three (!) presidents:

Three User Group presidents presenting

Martin emphasized that you need to have (at least) one guy who is passionate about the topic and one (often another guy) who gets things organized. Apart from that, size doesn’t matter that much: You can start a user group with three persons. Kamil and Luiza told us their story about how they started the POUG user group in Poland while Stefan gave us some insights into the quite extensive DOAG internal organization.

Continued with RAC Performance Internals by Anil Nair who attracted a substantial audience in the largest room:

Anil Nair in room Tokio

One takeaway here was for me that on Exadata, Oracle RAC uses ExaFusion to achieve 3 times faster block transfers over the interconnect compared to non-Exadata platforms – which is one more reason why Oracle uses Exadata as the platform for their autonomous database offers.

Then I was really shocked by the outcome of the DOAG support survey: Two thirds of the 270 customers interviewed said they are not satisfied with Oracle Support! That was even worse than two years ago when that survey reported over 50 % discontent. The room Singapur was full – but unfortunately filled with frustrated people who gave the Oracle officials present a hard time. I felt bad for them, it was a bitter moment, being for so long with Oracle myself in the past.

Went on to attend The battle between Oracle vs. PostgreSQL:

Jan Karremans (Postgres) vs. Daniel Westermannn (Oracle)

Frankly, my impression was that Daniel came up with the better arguments but Jan scored many sympathy points – nonetheless because many people wanted Postgres to win over Oracle. I think Oracle has to watch out here in order not to alienate from their customer base further. Otherwise, Postgres might continue to take away market share.

Next talk was from Paolo Kreth about Open Source Databases:

Paolo Kreth on stage

His message was that DBAs cannot ignore the usage of open source tools but have to be approachable and try to assist the internal customers with these tools. Otherwise they may just become less relevant.

Another interesting presentation was The changing role of the DBA by Valentin Leonard Tabacaru:

Valentin Leonard Tabacaru presenting

Especially in the light of Oracle’s new autonomous databases, his prediction is that maintenance tasks like patching will disappear while areas like data modelling, security or application tuning should become more important for DBAs.

Another very good talk was about the TimesTen In-Memory database by Doug Hood:

Doug Hood

He showed some convincing facts to back his claim that TimesTen is the fastest In-Memory database regarding OLTP workloads while also providing fault-tolerance and scalability.

All in all, the DOAG annual conference rocked again, kudos to the DOAG members who helped organizing and running it again so smoothly!

 

Leave a comment

Installing an #Exasol 6.1 Cluster on VirtualBox

After having installed the latest VirtualBox version, an ISO file with the latest Exasol version has to be downloaded. The machine hosting VirtualBox should have at least 16 GB RAM and 80 GB free disk space in order to run a 2+1 Cluster with 3 data nodes and one license server. I’m doing it on my Windows 10 notebook.

1.5 GB RAM is sufficient for the License Server and it needs only one disk

Insert the ISO file as a virtual CD

Make sure the License Server boots from CD first

Configure the private Network for the License Server

Configure the public network for the License Server

Now power on the virtual machine just created and configured. It should come up with this screen and you type in install

Tab to OK and press return

Tab to OK and press return

Tab to OK and press return

Choose a password. I used exasol12

Enter the public IP of the License Server. My VirtualBox Host-Only Ethernet Adapter is configured with 192.168.43.1 – therefore I use 192.168.43.10. It should also work with the VirtualBox Standard setting, in this case use 192.168.56.10. When in doubt, call ipconfig from the command shell.

Tab to OK and press return

The installation started from the last screen took me about 5 Minutes. Now type local and wait for the License Server to boot from disk. You may remove the virtual CD from it afterwards.

Do yourself a favor and pause for 5 Minutes after the machine booted from disk before you try to connect to EXAoperation. I’m using Chrome because it didn’t work well with FireFox:

The default password of the user admin is admin

Congratulation when you see the below screen! This was already the hardest part 🙂

Now add a log service to monitor the following

This helps to follow what happens under the covers for the next steps

Now the first data node is created as a VM first with 3 GB RAM

It gets two disks:

Same two network cards as the License Server and make sure it boots from network only:

Do not yet power on the new data node! Now it’s configured in EXAoperation. Go to Nodes and click Add:

 

You get the MAC addresses from here:

Now click on the new node to configure it further:

And so forth for the other three storage partitions so that it looks at the end like this:

Pay attention to the sizes and devices before you power on the new VM for the first data node. Then watch the log service. It should look like this:

While that is ongoing, create the VM for the second data node in the same manner as the first before. It should look like this in the end:

Click on the existing node in EXAoperation:

Change the numbers to 12 and the MAC addresses according to their values in the VirtualBox VM

Then after clicking Copy Node power on the VM. After you see it installing in the log service, do the same accordingly for the third data node. Eventually it will then look like this

Now select all nodes and execute the action set active flag

The state changes to Running Active for the three data nodes. Now click Startup Storage Service:

Now click Add Unused Disks after selecting the three data nodes:

Now click Add Volume:

Assign n11 and n12 for the new data volume with redundancy 2:

Add a new database:

Now click on the link underneath the added new database:

From the actions menu, select create

Then click startup. Change to the EXASolution main page. It should look like this:

Now you have a 2+1 Cluster running on VirtualBox – have fun with it 🙂

1 Comment

Understanding Distribution in #Exasol

Exasol doesn’t need much administration but getting distribution right matters

Exasol uses a clustered shared-nothing architecture with many sophisticated internal mechanisms to deliver outstanding performance without requiring much administration. Getting the distribution of rows between cluster nodes right is one of the few critical tasks left, though. To explain this, let’s say we have two tables t1 and t2:

The two tables are joined on the column JoinCol, while WHERE conditions for filtering are done with the column WhereCol. Other columns are not shown to keep the sketches small and simple. Now say these two tables are stored on a three-node cluster. Again, for simplicity only active nodes are on the sketch – no reserve nodes or license nodes. We also ignore the fact that small tables will be replicated across all active nodes.

Distribution will be random if no distribution key is specified

Without specifying a distribution key, the rows of the tables are distributed randomly across the nodes like this:

Absence of proper distribution keys: global joins

The two tables are then joined:

SELECT <something> FROM t1 JOIN t2 ON t1.JoinCol = t2.JoinCol;

Internally, this is processed as a global join which means network communication between the nodes on behalf of the join is required. This is the case because some rows do not find local join partners on the same node:

Distribution on join columns: local joins

If the two tables were distributed on their join columns with statements like these

ALTER TABLE t1 DISTRIBUTE BY JoinCol;

ALTER TABLE t2 DISTRIBUTE BY JoinCol;

then the same query can be processed internally as a local join:

Here every row finds a local join partner on the same node so no network communication between the nodes on behalf of the join is required. The performance with this local join is much better than with the global join although it’s the same statement as before.

Why you shouldn’t distribute on WHERE-columns

While it’s generally a good idea to distribute on JOIN-columns, it’s by contrast a bad idea to distribute on columns that are used for filtering with WHERE conditions. If both tables would have been distributed on the WhereCol columns, it would look like this:

This distribution is actually worse than the initial random distribution! Not only does this cause global joins between the two tables as already explained, statements like e.g.

<Any DQL or DML> WHERE t2.WhereCol='A';

will utilize only one node (the first with this WHERE condition) and that effectively disables one of Exasol’s best strengths, the Massive Parallel Processing (MPP) functionality. This distribution leads to poor performance because all other nodes in the cluster have to stand by being idle while one node has to do all the work alone.

Examine existing distribution with iproc()

The function iproc() helps investigating the existing distribution of rows across cluster nodes. This statement shows the distribution of the table t1:

SELECT iproc(),COUNT(*) FROM t1 GROUP BY 1 ORDER BY 1;

Evaluate the effect of distribution keys with value2proc()

The function value2proc() can be used to display the effect that a (new) distribution key would have:

SELECT home_node,COUNT(*) FROM (SELECT value2proc(JoinCol) AS home_node FROM t1) GROUP BY 1 ORDER BY 1;

Conclusion

Distribution on JOIN-columns leads to local joins which perform better than global joins: Do that!

Distribution on WHERE-columns leads to global joins and disables the MPP functionality, both causing poor performance: Don’t do that!

1 Comment

%d bloggers like this: