Setting up Cluster on Amazon Web Services (AWS)
Amazon Web Services (AWS) is a comprehensive, evolving cloud computing platform that offers a suite of cloud-computing services. The services provided by this platform that is important for TIBCO ComputeDB are Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). You can set up TIBCO ComputeDB cluster on Amazon Web Services using one of the following options:
The AWS Marketplace option is best suited if you want to quickly launch a basic TIBCO ComputeDB cluster on a single EC2 instance and start playing with it.
The EC2 Scripts would be a better option if you want to launch the TIBCO ComputeDB cluster on multiple EC2 instances. For this option, you must download the product tarball file (.tar.gz) locally. It is available inside a zip archive available on TIBCO eDelivery website. Otherwise, the script picks the Community Edition of the product from GitHub.
The AWS Management Console option only lets you launch the EC2 instance(s) using the TIBCO ComputeDB AMI. However, you must configure and start the TIBCO ComputeDB cluster manually.
TIBCO ComputeDB EC2 Scripts
The TIBCO ComputeDB EC2 scripts enable you to launch and manage TIBCO ComputeDB clusters on Amazon EC2 instances quickly. They also allow you to provide custom configuration for the cluster via TIBCO ComputeDB configuration files, before launching the cluster.
The snappy-ec2
script is the entry point for these EC2 scripts and is derived from the spark-ec2
script available in Apache Spark 1.6.
The scripts are available on GitHub in the snappy-cloud-tools repository and also as a .tar.gz file on the release page file.
Note
The EC2 scripts are provided on an experimental basis. Feel free to try it out and provide your feedback as via GitHub issues.
This section covers the following:
- Prerequisites
- Deploying TIBCO ComputeDB Cluster with EC2 Scripts
- Cluster Management
- Known Limitations
Prerequisites
-
Ensure that you have an existing AWS account with required permissions to launch EC2 resources
-
Create an EC2 Key Pair in the region where you want to launch the TIBCO ComputeDB Cloud cluster
Refer to the Amazon Web Services EC2 documentation for more information on generating your EC2 Key Pair. -
Using the AWS Secret Access Key and the Access Key ID, set the two environment variables,
AWS_SECRET_ACCESS_KEY
andAWS_ACCESS_KEY_ID
. You can find information about generating these keys in the AWS IAM console page.
If you already have set up the AWS Command Line Interface on your local machine, the script automatically detects and uses the credentials from the AWS credentials file.For example:
pre export AWS_SECRET_ACCESS_KEY=abcD12efGH34ijkL56mnoP78qrsT910uvwXYZ1112 export AWS_ACCESS_KEY_ID=A1B2C3D4E5F6G7H8I9J10
-
Ensure Python v 2.7 or later is installed on your local computer.
Deploying TIBCO ComputeDB Cluster with EC2 Scripts
In the command prompt, go to the directory where the snappydata-ec2-<version>
.tar.gz is extracted or to the
aws/ec2 directory where the TIBCO ComputeDB cloud tools repository is cloned locally.
Syntax
./snappy-ec2 -k <your-key-name> -i <your-keyfile-path> <action> <your-cluster-name> [options]
Here:
-
<your-key-name>
refers to the name of your EC2 key pair. -
<your-keyfile-path>
refers to the path to the key (typically .pem) file. -
<action>
refers to the action to be performed. Some of the available actions are launch, destroy, stop, start and reboot-cluster. Uselaunch
action to create a new cluster whilestop
andstart
actions work on existing clusters.
By default, the script starts one instance of a locator, lead, and server each. The script identifies each cluster by its unique cluster name that you provide and internally ties the members (locators, leads, and stores/servers) of the cluster with EC2 security groups, whose names are derived from the cluster name.
When running the script, you can also specify options to configure the cluster such as the number of stores in the cluster and the region where the EC2 instances should be launched.
Example
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem --stores=2 --with-zeppelin --region=us-west-1 launch my-cluster
The above example launches a TIBCO ComputeDB cluster named my-cluster with two stores or servers. The locator is associated with a security group named my-cluster-locator, and the servers are associated with my-cluster-store security group.
The cluster is launched in the N. California (us-west-1) region on AWS and has an Apache Zeppelin server running on the instance where the lead is running.
The example assumes that you have the key file (my-ec2-key.pem) in your home directory for EC2 Key Pair named 'my-ec2-key'.
Assuming IAM Role in the AWS EC2 Scripts
An IAM user in AWS can gain additional (or different) permissions, or get permissions to perform actions in a different AWS account through EC2 scripts. You can configure the AWS EC2 scripts to use an IAM role by passing the following properties:
-
assume-role-arn: The Amazon Resource Name (ARN) of the IAM role to be assumed. This IAM role's credentials are used to launch the cluster. If you are using the switch role functionality, this property is mandatory.
-
assume-role-timeout: Timeout in seconds for the temporary credentials of the assumed IAM role, min is 900 seconds, and max is 3600 seconds.
-
assume-role-session-name: Name of this session in which this IAM role is assumed by the user.
Example
./snappy-ec2 -k <your-key-name> -i <your-keyfile-path> stop snap_ec2_cluster --with-zeppelin --authorized-address=<Authorized IP Address> --assume-role-arn=<role-arn> --assume-role-timeout=<timeout> --assume-role-session-name=<name-for-session>
Note
By default, the cluster is launched in the N. Virginia (us-east-1) region on AWS. To launch the cluster in a specific region use option --region
.
Cluster Management
This section covers the following:
- Using custom build
- Specifying Properties
- Stopping the Cluster
- Resuming the Cluster
- Adding Servers to the Cluster
- Listing Members of the Cluster
- Connecting to the Cluster
- Destroying the Cluster
- Starting Cluster with Apache Zeppelin
- More Options
Using Custom Build
This script by default, uses the TIBCO ComputeDB build available on the GitHub releases page to launch the cluster.
To select a version of the OSS build available on GitHub, use option --snappydata-version
.
You can also provide your own TIBCO ComputeDB build to the script to launch the cluster, by using
option --snappydata-tarball
to the launch
command.
The build can be present either on a local filesystem or as a resource on the web.
For example, to use TIBCO ComputeDB Enterprise build to launch the cluster, download the product tarball from https://edelivery.tibco.com to your local machine and give its path as value to the above option.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem launch my-cluster --snappydata-tarball="/home/ec2-user/snappydata/distributions/TIB_compute_1.2.0_linux.tar.gz"
Alternatively, you can also put your build tarball file on a public web server and provide its URL to this option.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem launch my-cluster --snappydata-tarball="https://s3-us-east-2.amazonaws.com/mybucket/distributions/TIB_compute_1.2.0_linux.tar.gz"
The build file should be in .tar.gz format.
Specifying Properties
You can specify the configuration for the cluster via command-line options. Use --locator-conf
to specify the
configuration properties for all the locators in the cluster. Similarly, --server-conf
and --lead-conf
allows you
to specify the configuration properties for servers and leads in the cluster, respectively.
Following is a sample configuration for all the three processes in a TIBCO ComputeDB cluster:
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem --stores=2 launch my-cluster \
--locator-conf="-peer-discovery-port=9999 -heap-size=1024m" \
--lead-conf="-spark.executor.cores=10 -heap-size=4096m -spark.ui.port=3333" \
--server-conf="-client-port=1530"
The utility also reads snappy-env.sh, if present in the directory where helper scripts are present.
Note
- The earlier method of specifying the configuration properties by placing the actual configuration files in the directory, where helper scripts are available, is discontinued.
- Ensure that the configuration properties specified are correct. Otherwise, launching the TIBCO ComputeDB cluster may fail, but the EC2 instances would still be running.
Stopping the Cluster
When you stop a cluster, it shuts down the EC2 instances, and any data saved on the local instance stores is lost. However, the data saved on EBS volumes is retained, unless the spot-instances are used.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem stop cluster-name
Resuming the Cluster
When you start a cluster, it uses the existing EC2 instances associated with the cluster name and launches TIBCO ComputeDB processes on them.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem start cluster-name
Note
The start
command, or launch
command with --resume
option, ignores the --locators
, --leads
, or --stores
options and launches the TIBCO ComputeDB cluster on existing instances.
However, if the configuration options are provided, they are read and processed, thus overriding their values that were provided when the cluster was launched or started previously.
Adding Servers to the Cluster
This is not yet supported using the script. You must manually launch an instance with (cluster-name)-stores
group and
then use launch
command with the --resume
option.
Listing Members of the Cluster
To get the first locator's hostname:
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem get-locator cluster-name
Use the get-lead
command to get the first lead's hostname.
Connecting to the Cluster
You can connect to any instance of a cluster with SSH using the login command. It logs you into the first lead instance. You can then use SSH to connect to any other member of the cluster without a password. The TIBCO ComputeDB product directory is located at /opt/snappydata/ on all the members.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem login cluster-name
Destroying the Cluster
Destroying a cluster permanently destroys all the data on the local instance stores and on the attached EBS volumes.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem destroy cluster-name
This also deletes the security groups created for this cluster.
Starting Cluster with Apache Zeppelin
Optionally, you can start an instance of the Apache Zeppelin server with the cluster. Apache Zeppelin provides a web-based interactive notebook that is pre-configured to communicate with the TIBCO ComputeDB cluster. The Zeppelin server is launched on the same EC2 instance where the lead node is running.
./snappy-ec2 -k my-ec2-key -i ~/my-ec2-key.pem --with-zeppelin launch cluster-name
More Options
For a complete list of options provided by the script, run ./snappy-ec2
. The options are also provided in the following list for quick reference:
Usage: snappy-ec2 [options] <action> <cluster_name>
<action> can be: launch, destroy, login, stop, start, get-locator, get-lead, reboot-cluster
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-s STORES, --stores=STORES
Number of stores to launch (default: 1)
--locators=LOCATORS Number of locator nodes to launch (default: 1)
--leads=LEADS Number of lead nodes to launch (default: 1)
-w WAIT, --wait=WAIT DEPRECATED (no longer necessary) - Seconds to wait for
nodes to start
-k KEY_PAIR, --key-pair=KEY_PAIR
Name of the key pair to use on instances
-i IDENTITY_FILE, --identity-file=IDENTITY_FILE
SSH private key file to use for logging into instances
-p PROFILE, --profile=PROFILE
If you have multiple profiles (AWS or boto config),
you can configure additional, named profiles by using
this option (default: none)
-t INSTANCE_TYPE, --instance-type=INSTANCE_TYPE
Type of server and lead instance to launch (default:
m4.large). WARNING: must be 64-bit; small instances
won't work
--locator-instance-type=LOCATOR_INSTANCE_TYPE
Locator instance type (default: t2.medium)
-r REGION, --region=REGION
Name of the EC2 region where instances are launched (default: us-east-1).
-z ZONE, --zone=ZONE Availability zone to launch instances in, or 'all' to
spread stores across multiple (an additional $0.01/Gb
for bandwidthbetween zones applies) (default: a single
zone chosen at random)
-a AMI, --ami=AMI Amazon Machine Image ID to use
--snappydata-tarball=SNAPPYDATA_TARBALL
HTTP URL or local file path of the TIBCO ComputeDB
distribution tarball with which the cluster will be
launched. (default: )
--locator-conf=LOCATOR_CONF
Configuration properties for locators (default: )
--server-conf=SERVER_CONF
Configuration properties for servers (default: )
--lead-conf=LEAD_CONF
Configuration properties for leads (default: )
-v SNAPPYDATA_VERSION, --snappydata-version=SNAPPYDATA_VERSION
Version of TIBCO ComputeDB to use: 'X.Y.Z' (default:
LATEST)
--with-zeppelin Launch Apache Zeppelin server with the cluster. It launches
in the same instance where the lead node is running.
--deploy-root-dir=DEPLOY_ROOT_DIR
A directory to copy into the root (/) directory on the first locator.
Must be absolute. Note that a trailing slash is handled as
per rsync: If you omit it, the last directory of the
--deploy-root-dir path will be created in / before
copying its contents. If you append the trailing
slash, the directory is not created and its contents
are copied directly into /. (default: none).
-D [ADDRESS:]PORT Use SSH dynamic port forwarding to create a SOCKS
proxy at the given local address (for use with login)
--resume Resume installation on a previously launched cluster
(for debugging)
--root-ebs-vol-size=SIZE
Size (in GB) of root EBS volume for servers and leads.
TIBCO ComputeDB is installed on root volume.
--root-ebs-vol-size-locator=SIZE
Size (in GB) of root EBS volume for locators.
TIBCO ComputeDB is installed on root volume.
--ebs-vol-size=SIZE Size (in GB) of each additional EBS volume to be
attached.
--ebs-vol-type=EBS_VOL_TYPE
EBS volume type (e.g. 'gp2', 'standard').
--ebs-vol-num=EBS_VOL_NUM
Number of EBS volumes to attach to each node as
/vol[x]. The volumes will be deleted when the
instances terminate. Only possible on EBS-backed AMIs.
EBS volumes are only attached if --ebs-vol-size > 0.
Only support up to 8 EBS volumes.
--placement-group=PLACEMENT_GROUP
Which placement group to try and launch instances
into. Assumes placement group is already created.
--spot-price=PRICE If specified, launch stores as spot instances with the
given maximum price (in dollars)
-u USER, --user=USER The SSH user you want to connect as (default:
ec2-user)
--delete-groups When destroying a cluster, delete the security groups
that were created
--use-existing-locator
Launch fresh stores, but use an existing stopped
locator if possible
--user-data=USER_DATA
Path to a user-data file (most AMIs interpret this as
an initialization script)
--authorized-address=AUTHORIZED_ADDRESS
Address to authorize on created security groups
(default: 0.0.0.0/0)
--additional-security-group=ADDITIONAL_SECURITY_GROUP
Additional security group to place the machines in
--additional-tags=ADDITIONAL_TAGS
Additional tags to set on the machines; tags are
comma-separated, while name and value are colon
separated; ex: "Task:MySnappyProject,Env:production"
--copy-aws-credentials
Add AWS credentials to hadoop configuration to allow
Snappy to access S3
--subnet-id=SUBNET_ID
VPC subnet to launch instances in
--vpc-id=VPC_ID VPC to launch instances in
--private-ips Use private IPs for instances rather than public if
VPC/subnet requires that.
--instance-initiated-shutdown-behavior=INSTANCE_INITIATED_SHUTDOWN_BEHAVIOR
Whether instances should terminate when shut down or
just stop
--instance-profile-name=INSTANCE_PROFILE_NAME
IAM profile name to launch instances under.
--assume-role-arn=The Amazon Resource Name (ARN) of the IAM role to be assumed.
This IAM role's credentials are used to launch the cluster.
If you are using the switch role functionality, this property is mandatory.
--assume-role-timeout=Timeout in seconds for the temporary credentials of the
assumed IAM role, min is 900 seconds and max is
3600 seconds.
--assume-role-session-name=Name of this session in which this IAM role is
assumed by the user.
Known Limitations
-
Launching the cluster on custom AMI (specified via
--ami
option) does not work if the user ec2-user does not have sudo permissions. -
Support for option
--user
is incomplete.
AWS Management Console
You can launch a TIBCO ComputeDB cluster on an Amazon EC2 instance(s) using TIBCO ComputeDB AMIs available on AWS Marketplace. For more information on launching an EC2 instance, refer to the AWS documentation. This section covers the following:
Prerequisites
- Ensure that you have an existing AWS account with required permissions to launch the EC2 resources.
- Create an EC2 Key Pair in the region where you want to launch the TIBCO ComputeDB cluster.
Deploying TIBCO ComputeDB Cluster with AWS Management Console
To launch the instance and start the TIBCO ComputeDB cluster on EC2 instance(s), do the following:
If you are launching the cluster via AWS Marketplace, jump to step 5.
-
Open the Amazon EC2 console and sign in using your AWS login credentials. The current region is displayed at the top of the screen.
-
Select the region where you want to launch the instance.
-
Click Launch Instance from the Amazon EC2 console dashboard.
-
On the Choose an Amazon Machine Image (AMI) page, search and select the TIBCO ComputeDB AMI. For the 1.2.0 release, it can be named as tibco-computedb_1.2.0-yyyymmdd-amznlnx_2018.03.
-
Note
-
You can also further customize your instance(s) as shown below before you launch them. Refer to the AWS documentation for more information.
-
For the setup across multiple EC2 instances, specify the appropriate number for Number of instances field on Configure Instance page. For example, to launch a SnappyData cluster with 3 servers and 1 locator and 1 lead on separate instances, specify the number as 5. You can also launch locator and lead processes on a single EC2 instance, thereby reducing the instances to 4.
-
On Configure Security Group page, ensure that you open ports 22 (for SSH access to the EC2 instance) and 5050 (to access TIBCO ComputeDB Monitoring Console) for public IP address of your laptop or client terminal. For the setup on multiple instances, you also must open all traffic between the instances in this security group. You can do that by adding a rule with the group id of this security group as value for Source.
-
If you need to connect to the TIBCO ComputeDB cluster via a JDBC client application or tool, open ports 1527 and 1528 for the public IP of the host where your application/tool is running, in the security group.
-
-
You are directed to the last step Review Instance Launch. Check the details of your instance, and click Launch.
-
In the Select an existing key pair or create a new key pair dialog box, select your key pair.
-
Click Launch. The Launch Status page is displayed.
-
Click View Instances. The dashboard which lists the EC2 instances is displayed.
-
Click Refresh to view the updated list and the status of the instance(s) you just created.
-
Once the status of the instance changes to running, connect to the instance via SSH. You require:
- The private key (.pem) file of the key pair with which the instance was launched.
- The public DNS or IP address of the instance.
- The username that is used to connect. It depends on the AMI you have selected. For example, it could be ec2-user for Amazon Linux AMIs or ubuntu for Ubuntu-based AMIs.
Refer to the following documentation for more information on accessing an EC2 instance.
Note
The public DNS/IP of the instance is available on the EC2 dashboard > Instances page. Select your EC2 instance and search for it in the lower part of the page.
TIBCO ComputeDB is already installed at /opt/snappydata in the launched EC2 instance(s). It also has Java 8 installed.
-
If you are launching the cluster across multiple EC2 instances, you need to do the following:
- Setup passwordless ssh access across these instances
- Provide EC2 instance information in TIBCO ComputeDB's conf files.
You can skip to the next step for a TIBCO ComputeDB cluster on a single EC2 instance.
After setting up the passwordless SSH access, provide EC2 instance information in TIBCO ComputeDB's conf files. At a minimum, provide private IP addresses of EC2 instances in appropriate conf files, viz.
conf/locators
,conf/servers
andconf/leads
.Sample conf files for a cluster with 3 servers, 1 locator and 1 lead are given below. Here the locator and lead processes are configured to run on the same EC2 instance.
cat /opt/snappydata/conf/locators 172.16.32.180 cat /opt/snappydata/conf/servers 172.16.32.181 172.16.32.182 172.16.32.183 cat /opt/snappydata/conf/leads 172.16.32.180
-
Go to the /opt/snappydata directory. Run the following command to start your cluster. By default, a basic cluster is launched with one data server, one lead, and one locator.
./sbin/snappy-start-all.sh
-
After deploying TIBCO ComputeDB, follow the instructions here, to use the product from Apache Zeppelin.
Accessing TIBCO ComupteDB Cluster
Before you access the SnappyData cluster, you must configure cluster's security group to allow connections from your client host on required ports.
In case you do not know the IP address of your client host, you can open these ports to the world (though, not recommended) by specifying 0.0.0.0/0
as Source against above port range in the security group.
Note that in such a case, any unknown user on the internet can connect to your cluster, if your cluster does not have security enabled.
So it is strongly recommended to add specific IP addresses as Source, in the format XXX.XXX.XXX.XXX/32
in your security group.
The quickest way to connect to your TIBCO ComputeDB cluster is probably using the snappy shell utility packaged with the distribution.
You can launch the snappy shell either from the same EC2 instance or from your laptop where you have TIBCO ComputeDB installed.
-
Connecting to the cluster from the same EC2 instance:
- Launch the snappy shell.
./bin/snappy
Note
Before connecting to the cluster, make sure the security group attached to this EC2 instance has ports 1527-1528 open for the public IP of the same ec2 instance.
-
Now, connect to the cluster using its private IP (you can also use the public DNS/IP instead):
snappy> connect client '(private-ip-of-EC2-instance):1527';
-
To connect to the cluster running on multiple EC2 instances, you can use private IP of the EC2 instance where either the locator or any of the servers is running.
- Launch the snappy shell.
-
Connecting to the cluster from your laptop (or any host outside AWS VPC):
- Launch the snappy shell:
${SNAPPY_HOME}/bin/snappy
Note
Before connecting to the cluster, make sure the security group attached to this EC2 instance has ports 1527-1528 open for the public IP of your laptop that is the host with TIBCO ComupteDB installed.
-
Now, connect to the cluster using the public DNS/IP of its EC2 instance:
snappy> connect client '<public-ip-of-EC2-instance>:1527';
-
To connect to the cluster running on multiple EC2 instances, you can use public IP of the EC2 instance where either the locator or any of the servers is running.
- Launch the snappy shell: