Create Auto-Scaling Container HPC with CfnCluster and Singularity

記事タイトルとURLをコピーする

日本語版はこちら

In my previous article, I used Singularity, a container specialized for HPC, on Amazon EC2. A procedure to install Singularity, build and exec (execute) a container of OpenFOAM was introduced. However, the environment is not suitable to submit multiple jobs or execute jobs which use multiple cores because it consists of single EC2 instance.

CfnCluster is an useful tool to create clusters of EC2. By using this, clusters which include a Master node and Compute nodes with Auto Scaling can be created easily.br>
In this article, a cluster is launched with CfnCluster, and configured to run Singularity there. Jobs to execute parameter-sweep with OpenFOAM are submitted to the cluster as an example. The results of the jobs are visualized on GUI. The usage fee of AWS for this procedure is shown at the end.

Architecture diagram

The architecture diagram of the envirnment we'll build is shown below. Some resources other than the diagram, such as DynamoDB, are also launched by CfnCluster to manage the environment. The environment consists of three kinds of servers: PrePost, Master and Compute.

PrePost(OS: Ubuntu)

CfnCluster is installed on this EC2 instance. By using $ cfncluster command here, we can create, delete and manage clusters. The results of the executed jobs are visualized on this instance by installing GUI.

Master(OS: Amazon Linux)

This EC2 instance is launched and managed by CfnCluster. A job scheduler is installed on it. We can submit, list and delete jobs, and describe the state of Commpute nodes.

Compute(OS: Amazon Linux)

These EC2 instances belong to an Auto Scaling Group, so as to scale dependeing on the amount of submitted jobs. By configuring along with this article, it scales between 0 and 10 instances. CFD with OprnFOAM is performed by these instances.

CfnCluster configures Master and Compute to share a volume with NFS. The container to execute OpenFOAM jobs, which is downloanded from Singularity Hub, is saved in the shared volime.

Jobs to submit

pitzDaily, one of OpenFOAM tutorials, is executed while varying its boundary condition for the inlet flow. 720 cases of analyses are executed as same as this previous article about AWS batch. Each velocity components, x, y and z components, on the boundary are varied within a certain range. Utilizing MPI is my future work.

Let's try

Creation of the key pair

Login your AWS Management Console, and choose your favourite Region. First, click "EC2" to create a key pair to use SSH connection to the EC2 instances. Click "Key Pairs" and "Create Key Pair", input the name of the key pair to create, and click "Create". Download your secret key (.pem), and save it securely.

Launching PrePost instance

Use CloudFormation to create VPC, Subnet and the PrePost instance. The template is available here. Click "CloudFormation" in the AWS Management Console. The following page is shown if this is your first time to use CloudFormation. Then click "Create New Stack". Click the "Choose File" button, and choose the downloaded CloudFormation template. Then click "Next". In the next page, please input your configuration.

Item Detail
Stack name Input CloudFormation stack name e.g. singularity-on-cfncluster
InstanceTypeParameter Choose instance type for PrePost. "t2.micro" is enough for this procedure.
KeyPairName Choose your key pair for SSH access to PrePost
SourceIp Input the IP address to allow as the spurce of SSH. cf. http://checkip.amazonaws.com/

Please acknowledge IAM resourses will be created, check the box, and click "Create". Then CloudFormation start creating the environment. PrePost instance is launched, and CfnCluster is installed automatically.

Creating a Cluster with CfnCluster

Here, we'll access to the PrePost instance with SSH, and create a cluster, consists of Master and Commpute nodes, by executing CfnCluster. The global IP of your PrePost instance is shown as PrePostIp in "Output" tab of the CloudFormation stack. Please connect with SSH to this IP with the following login information.

Item Value
Username ubuntu
Key Pair The key pair you configured when creating the CloudFormation stack

After succeeding in SSH, open the configuration file of CfnCluster by the following command.

$ vim .cfncluster/config

The same contents as this file should be shown. Strings with { }, such as {Region} or {AccessKey}, have to be altered. Please substitute the values shown in the "Outputs" tab of the CloudFormation stack.

Two kinds of clusters are defined as template in the configuration file. The first one, t2_singularity, launches all nodes as t2.micro. This is for testing while submitting a limited number of jobs. The second one, c4_singularity, launches all nodes as c4.xlarge in a Placement Group. This is for submitting many jobs. Singularity is installed on the both of clusters. Compute nodes auto-scales between 0 and 10. They are launched as spot instances.

Now let's create a cluster as c4_singularity. Use the following command.

$ cfncluster create cluster01 -t c4_singularity

If you don't use -t option, the cluster is created as t2_singularity because of cluster_template = t2_singularity in the configuration file. CfnCluster shows IP addresses of Master node as standard output when the creation is finished.

$ cfncluster create cluster01 -t c4_singularity
Beginning cluster creation for cluster: cluster01
Creating stack named: cfncluster-cluster01
Status: cfncluster-cluster01 - CREATE_COMPLETE
Output:"MasterPublicIP"="xxx.xxx.xxx.xxx"
Output:"MasterPrivateIP"="yyy.yyy.yyy.yyy"
Output:"GangliaPublicURL"="http://xxx.xxx.xxx.xxx/ganglia/"
Output:"GangliaPrivateURL"="http://yyy.yyy.yyy.yyy/ganglia/"

Job submission on Master

Please connect with SSH to the global (public) IP of Master with the following login information.

Item Value
Username ec2-user
Key Pair The key pair you configured in the configuration file of CfnCluster

After succeeding in SSH, execute the following commands.

$ git clone https://github.com/TakahisaShiratori/singularity-on-cfncluster.git
$ cd singularity-on-cfncluster
$ python submit_batch.py

submit_batch.py downloads OpenFOAM container, and submit 720 jobs while varying the boundary condition of pitzDaily. After that it executes $ qstat to list the submitted jobs. It takes a little time to complete jobs because Compute nodes are launced after the jobs are submitted. By configuring initial_queue_size non-zero, Compute instances run constantly and jobs are executed instantaneously. If you want to visualize results, I recommend to execute the first command of Visualization of results chapter while waiting. You can see directories where the results are stored by executing $ ls after the jobs end.

Visualization of results

Let's install OpenFOAM and GUI on PrePost to visualize the results. The output files can be read by mounting the disk, /home of Master, with NFS. X2Go and Ubuntu MATE are used as the GUI in the following procedure. You can use any other GUI if you have your own favourite.

Please execute the following commands on PrePost.

$ wget https://raw.githubusercontent.com/TakahisaShiratori/singularity-on-cfncluster/master/install_openfoam_gui.sh
$ bash install_openfoam_gui.sh

Commands to install OpenFOAM, X2Go Server and Ubuntu MATE are described on install_openfoam_gui.sh. Wait until the installation ends.

After the installation, access to PrePost with X2Go client. This page, provided by CFD Direct, is useful to learn how to use X2Go client. After connecting, install tools to enable NFS mount and create a mount point. Execute the following commands on the terminal.

$ sudo apt-get -y install nfs-common
$ sudo mkdir /cluster01_home

Change the Security Group of PrePost to enable NFS mount to Master. In the Management Console, you can see the Security Group named like cfncluster-cluster01-ComputeSecurityGroup-xxxxxxxxxxxxx. Let PrePost join the group. Execute the following command after substituting the private IP of Master in {MasterPrivateIp}. /home of Master will be mounted on PrePost.

$ sudo mount -t nfs -o hard,intr,noatime,vers=3,_netdev {MasterPrivateIp}:/home /cluster01_home

Create a directory $FOAM_RUN to run OpenFOAM. The output files are copyed there to visualize. The following commands are example to visualize the result of Job ID 1.

$ mkdir -p $FOAM_RUN
$ sudo sh -c "cp -r /cluster01_home/ec2-user/singularity-on-cfncluster/job_* $FOAM_RUN"
$ sudo chown -R ubuntu:ubuntu $FOAM_RUN
$ cd $FOAM_RUN/job_1/pitzDaily
$ paraFoam

ParaView is launched if the commands worked. By configureing ParaView properly, the result is visualized like the following screenshot.

Deleting the Environment

Let's delete the environment to save AWS usage fee along the following procedure.

Undo the change of the Security Group for PrePost. The next step, deletion of the cluster, cannot be completed without this.

Delete the cluster by using the follownig command on PrePost.

$ cfncluster delete cluster01

Finally, delete the stack of CloudFormation. Then PrePost(EC2), Subnet, VPC and any other resources are deleted automatically.

AWS usage fee for this procedure

I did the procedure above in Singapore Region (ap-southeast-1) because I don't have any other resources there, and thus easily show how much the procedure costs. The cost is 0.57 USD according to the AWS Managament Console.

Summary

In this article, Singularity is utilized on the cluster created by CfnCluster. This architecture realises three benefits: (1) Application mobility by containers, (2) Auto Scaling of Compute nodes depending on the amount of jobs, and (3) Parallel computing with MPI. As sample jobs, 720 cases of pitzDaily, one of tutorials of OpenFOAM, are executed. MPI is not introduced the example. It is my future work.