It’s amazingly easy to run an Exasol Cluster on Amazon Web Services (AWS).
Subscribe Exasol in AWS marketplace
After having registered and having logged in to your AWS account, go to the AWS marketplace and search for Exasol:
Click on the Exasol Single Node and Cluster BYOL link and then on Continue to Subscribe:
After having reviewed the T&C, click on Accept Terms. It shows this message afterwards:
Create Key Pair
Now login to the AWS Management Console, select a region close to your location and open the EC2 Dashboard. Click on Key Pairs:
Click on Create Key Pair now and enter a name for the new Key Pair, then click on Create:
Now you are ready to use the Exasol Cloud Deployment Wizard. Stay logged in with AWS Management Console as you will be routed back there by the Deployment Wizard soon.
Using the Cloud Deployment Wizard
Put this URL into your browser: https://cloudtools.exasol.com/ and click on AWS then:
Select a region close to your location and click on Continue:
Click on Advanced Configuration and specify
License Model Bring-your-own-license, System Type Enterprise Cluster, Instance Family Memory Optimized, Instance Type r5, Instance Model r5 large, Number of DB Node 1 then click Continue.
BYOL works without license file with a limit of 20 GB memory for the database. Means no costs are charged by Exasol (But by Amazon) for this environment.
Select create new VPC and click on Launch Stack on this page now:
This takes you to the Quick create stack page of CloudFormation in AWS Management Console:
Enter these details on the page:
Key Pair (select the key pair created previously)
SYS User Password
ADMIN User Password
Public IPs (true)
Tick the acknowledge box and click on Create stack
Now go to the EC2 details page and copy the Public IP of the management node:
Put that with an https:// prefix into a browser and click on Advanced:
Then you should see a progress bar like this:
That screen changes after about 30 Minutes to the EXAoperation login screen.
Login as user admin with the password, you specified previously on the CloudFormation Quick create stack page. There should be a database running:
As you can see now, you have a database, a remote archive volume using an Amazon S3 bucket ready for backup & restore and a log service to monitor your system.
This database is limited to 20 GB memory only unless a license file is uploaded to the license server aka management node. For educational purposes, I don’t need more.
Use Elastic IPs
The public IPs of your data nodes will change upon every restart, which is probably not convenient.
Therefore, click on Elastic IPs in the EC2 dashboard, then click on Allocate new address:
Select Amazon pool then click on Allocate:
Click on the IP on the following screen:
Select the action Associate address on the next screen:
Select the data node from the Select instance pull down menu and click on Associate:
Close the next screen and go to the EC2 instance page. You should see the elastic IP assigned to the data node there:
Connect with a SQL Client to your Exasol database on AWS
This is how that looks with DbVisualizer:
And that’s it: Now you have an Exasol 1+0 cluster running on AWS. That’s not the same as a single node system, because this 1+0 cluster can be enlarged with more data nodes. I will show how to do that in future posts.
A word about costs: Instead of using our corporate AWS account, I registered myself to see how much that will take. It was less than 80 Euro with a 2+1 cluster environment I used for about one month, shutting down the EC2 instances whenever I didn’t need them for testing and for creating courseware. It should be well below 10 Euro per day with the very moderate resource consumption configured for the environment subject to my postings.
Stay tuned for some more to come about Exasol on AWS 🙂
Adding a cluster node will not only increase the available storage capacity but also the total compute power of your cluster. This scale-out is a quite common operation for Exasol customers to do.
My example shows how to change an existing 2+1 cluster into a 3+0 cluster. Before you can enlarge the database with an active node, this node has to be a reserve node first. See here how to add a reserve to a 2+0 cluster. Of course you can add another reserve node to change from 3+0 to 3+1 afterwards. See here if you wonder why you may want to have a reserve node at all.
Initial state – reserve node is present
I start with a 2+1 cluster – 2 active nodes and 1 reserve node:
For later comparison, let’s look at the distribution of rows of one of my tables:
The rows are roughly even distributed across the two active nodes.
Before you continue, it would be a good idea to take a backup on a remote archive volume now – just in case.
Shutdown database before volume modification
A data volume used used by a database cannot be modified while that database is up, so shut it down first:
After going to the Storage branch in EXAoperation, click on the data volume:
Then click on Edit:
Decrease volume redundancy to 1
Change the redundany from 2 to 1, then click Apply:
Why is the redundancy reduced from 2 to 1 here? Let’s try to explain that. Initially, I had 2 active nodes with a volume using redundancy 2:
A and B are master segments while A’ and B’ are mirrored segments. If I could add a node to this volume keeping the existing segments, it would look like this:
Of course this would be a bad idea. The redundancy is reduced to 1 before the new node is added to the volume:
Only distributed master segments with no mirrors at first. Then the redundancy is again increased to 2:
This way, every master segment can be mirrored on a neighbor node. That’s why the redundancy needs to be reduced to 1.
Add new node to volume
After having decreased the volume redundancy to 1, click Edit on the volume detail page again and add n13 as a new master node to the volume and click Apply:
Increase redundancy to 2
Now click Edit again and increase the redudancy to 2:
The state of the volume shows now as RECOVERING – don’t worry, it just means that mirrored segments are now created.
Enlarge the database
Now click on the database link on the EXASolution screen:
Select the Action Enlarge and click Submit:
Enter 1 and click Apply:
The database detail page looks like this now:
Technically, this is a 3+0 cluster now – but the third node doesn’t contain any data yet. If we look at the same table as before, we see that no rows are on the new node:
To change that, a REORGANIZE needs to be done. Either on the database layer, on schema layer or on table layer. Most easy to perform is REORGANIZE DATABASE:
Took me about 10 Minutes on my tiny database. That command re-distributes every table across all cluster nodes and can be time consuming with high data volume. While a table is reorganized, that table is locked against DML. You can monitor the ongoing reorganization by selecting from EXA_DBA_PROFILE_RUNNING in another session.
Let’s check the distribution of the previous table again:
As you can see above, now there are rows on the added node. Also EXAoperation confirms that the new node is not empty any more:
On a larger database, you would see that the volume usage of the nodes is less than before per node and every node is holding roughly the same amount of data. For failsafety, you could add another reserve node now.
Summary of steps
- Add a reserve node (if not yet existing)
- Take a backup on a remote archive volume
- Shutdown database
- Decrease volume redundancy to 1
- Add former reserve node as new master node to the volume
- Increase redundancy to 2
- Enlarge database by 1 active node
- Add another reserve node (optionally)
For development, demonstrations and testing, I need different database environments: Oracle, Postgres and Exasol in the first place. Having them available as VMs on my notebook is quite convenient. I consider my current corporate notebook an upper middleclass one. It’s a Dell Latitude 7480 with 2 cores, an SSD disk and 16 GB memory running Windows 10. Not too shabby but also not extremely powerful.
After having used VirtualBox for years, recently an opportunity came up to become a bit more familiar with Hyper-V, because one of our customers insisted to use only that for a team training. Yes, I’m a bit biased towards VirtualBox. Why do i prefer it over Hyper-V? Because it’s way faster for what I do with it. Especially, I observed that Hyper-V consumes much more CPU resources for the VMs than VirtualBox does. And that slows down everything of course.
For example, when I do an Exasol cluster node installation, it takes more than 30 Minutes with Hyper-V compared to 5 Minutes with VirtualBox! And the setup is the same for both: On my notebook, I create 4 VMs: 1 license server with 1500 MB memory and 3 data nodes each with 2500 MB memory. Each VM gets 1 virtual core. That’s no problem for VirtualBox but Hyper-V struggles and raises the CPU utilization on my notebook to 100% or close during the whole install.
In general, both Hyper-V and VirtualBox can do the same or very similar things. I’m sure there are use cases where Hyper-V performs well on a notebook too, and probably it’s better suited for dedicated virtualization servers than for a notebook anyway. So don’t get me wrong: I do not say VirtualBox is better than Hyper-V overall.
But if you want to run database sandboxes on your notebook, I strongly recommend to use VirtualBox instead of Hyper-V.