(:damion @world-state)

freelance software architect. artist. radio ham.


AWS, Immutant, Torquebox, and Clustering - Part 1


categories:   AWS clojure torquebox immutant

note: these writeups assume you’ve got a working Clojure/JVM environment, know how to use the basics of AWS (ec2, AMI, S3), and have played with single-node Immutant

Amazon Web Services + Immutant and Torquebox

Immutant and Torquebox are completely awesome, as is the community support and responsiveness on the Freenode IRC channel #immutant. After lots of feedback, suggestions, and help, it seemed like it would be worthwhile to document the setup procedure for this stack running clustered on AWS.

Out of the box, Immutant is configured to use multicast for node discovery, AWS does not support this. I wanted a setup that would allow me to dynamically fire up arbitrary worker nodes (immutant/torquebox) that would participate in the cluster and register with a fronting load balancing Apache / mod_cluster instance.

Elastic IPs are limited on AWS, so I wanted to use as few as possible. In my setup, load balancers get an elastic IP, as do the database master nodes. Immuntant/Torquebox nodes are created from AMIs I’ve built in advance, and use whatever address AWS assigns. These AMIs dynamically pull configuration from my git repositories and set them selves up during boot time. I should probably look at Pallet, but I’ve just not had enough time. I ended up cobbling together shell scripts that create and destroy nodes based on the AMI id.

My Needs/Setup (Overview)

  • AWS instance running Apache/mod_cluster, with an AWS Elastic IP
  • An arbitrary number of Immutant/TorqueBox AWS instances participating in a cluster
  • A MongoDB instance with an Elastic IP.
  • Lots of configuration checked into DVCS (git)

Immutant and AWS

I’ll start with the most fun and important part first. I’ll give my configurations for Apache/mod_cluster in a follow up post. Most people will probably be interested in the Immutant on AWS cluster configuration above all else.

Immutant has excellent tutorials and instructions, so if you’re completely new to Immutant, you’ll want to check those out first.

To get this working, you’ll be editing some XML. Sounds like fun right? It’s not really that bad, but it’s XML nonetheless.

standalone-ha.xml

standalone-ha.xml is where most of the changes to the stock configuration will be made. First we’ll install Immutant from the command line, if you’ve not done so already:

# assumes leiningen, and the lein-immutant plugin 
lein immutant install

From here we can take a look at the standalone-ha.xml file located in the ~/.lein/immutant/current/jboss/standalone/configuration/ directory. As part of my deployment process I have this file copied to the appropriate directory when my AMI comes up. I keep a copy of standalone-ha.xml in a git repo alongside other configurations and dev-ops type scripts. You’ll have to do something similar if you want to have the ability to arbitrarily bring up and shut down members of your cluster.

Based on the suggestion here of not binding TorqueBox’s public interface to 0.0.0.0, I altered the <interfaces> tag and changed the public interface sub-element from this:

<interface name="public">
  <inet-address value="${jboss.bind.address:127.0.0.1}"/>
</interface>

to this:

<interface name="public">
  <nic name="eth0"/>
</interface>

On my AMIs, eth0 is the internal AWS IP address. Your instances should have security group settings that allow UDP and TCP communications as well, but I’ll get to that in a follow up post.

Next, we need to configure JGroups to use some method of TCP communications for broadcast and discovery. The default should look something like this:

{% gist 5616083 %}

We need to change the default-stack to tcp, and then modify the TCP stack. I removed the UDP configuration completely, but you can leave it alone if you want.

MPING will not work on AWS, but thankfully there is TCPPING and S3_PING. S3_PING is ultimately what you’ll want to set up if you want to be able to add and remove nodes from your cluster without touching the configuration, but TCPPING is easier to setup and verify, so I’ll cover that first. For more JGroups info, check the protocol list.

You’ll want to replace the above JGroups configuration with the configuration below:

{% gist 5616174 %}

Of course, change ip.address.node.1 to the address bound to eth0 on your first cluster node, and ip.address.node.2 to the address bound to eth0 on the second.

Finally, we need to tell HornetQ to use our JGroups TCP configuration instead of the UDP (which is the default).

We’ll be looking at the subsystem:

<subsystem xmlns="urn:jboss:domain:messaging:1.3">

In this subsystem, we need to change:

<broadcast-groups>
    <broadcast-group name="bg-group1">
        <jgroups-stack>${msg.jgroups.stack:udp}</jgroups-stack>
        <jgroups-channel>${msg.jgroups.channel:hq-cluster}</jgroups-channel>
        <broadcast-period>5000</broadcast-period>
        <connector-ref>netty</connector-ref>
    </broadcast-group>
</broadcast-groups>
<discovery-groups>
    <discovery-group name="dg-group1">
        <jgroups-stack>${msg.jgroups.stack:udp}</jgroups-stack>
        <jgroups-channel>${msg.jgroups.channel:hq-cluster}</jgroups-channel>
        <refresh-timeout>10000</refresh-timeout>
    </discovery-group>
</discovery-groups>

To:

<broadcast-groups>
  <broadcast-group name="bg-group1">
    <jgroups-stack>${jgroups.stack:tcp}</jgroups-stack>
    <jgroups-channel>${jgroups.channel:hq-cluster}</jgroups-channel>
    <broadcast-period>2000</broadcast-period>
    <connector-ref>netty</connector-ref>
  </broadcast-group>
</broadcast-groups>
<discovery-groups>
  <discovery-group name="dg-group1">
    <jgroups-stack>${jgroups.stack:tcp}</jgroups-stack>
    <jgroups-channel>${jgroups.channel:hq-cluster}</jgroups-channel>
    <refresh-timeout>10000</refresh-timeout>
  </discovery-group>
</discovery-groups>

Test It, Round 1

At this point if you were to fire up Immutant on each of the nodes that you configured in the initial_hosts setting of the TCPPING JGroups configuration, you should see a message in the log file indicating that one node became the Master, and the other not. You’ll also see a cluster count message:

18:42:57,478 INFO  [org.jboss.as.clustering] (MSC service thread 1-1) JBAS010238: Number of cluster members: 2
18:42:57,479 INFO  [org.projectodd.polyglot.hasingleton] (MSC service thread 1-1) inquire if we should be master (testapp.clj-hasingleton-global)
18:42:57,480 INFO  [org.projectodd.polyglot.hasingleton] (MSC service thread 1-1) Ensuring NOT HASingleton master (testapp.clj-hasingleton-global)
18:42:57,480 INFO  [org.projectodd.polyglot.hasingleton] (MSC service thread 1-1) Started HASingletonCoordinator

Above we see the log output of our non-master node. You can find the full logs in the ~/.lein/immutant/current/jboss/standalone/log directory.

S3_PING, standalone-ha.xml

TCPPING is great to at least verify your AWS settings are correct and that JGroups is working properly. I spent a lot of time with a non-working S3_PING configuration that really didn’t report any errors, and was not seeing any cluster communications. After many helpful suggestions from #immutant, I cranked up logging levels, and eventually started iterating through possible problems. Eventually I got it working.

S3_PING is great if you want a dynamic AWS environment. No hard-coding IP addresses at all. You just configure an S3 bucket, and get the AWS keys for the user who has read/write/list privilege to that bucket. Since IP addresses can change on AWS, you’re really just asking for trouble if you rely on them. You could of course use Elastic IPs, but you do not have an unlimited number.

The change is quite simple. In the JGroups subsystem, replace:

<protocol type="TCPPING">
  <property name="timeout">30000</property>
  <property name="initial_hosts">ip.address.node.1[7600],ip.address.node.2[7600]</property>
</protocol>

With:

<protocol type="S3_PING">
  <property name="secret_access_key">TOPSYKRETS</property>
  <property name="access_key">TOPSYKRETS</property>
  <property name="location">some.s3.bucket.name</property>
</protocol>

I’m pretty sure you have to make the S3 bucket before using the configuration, so if you see any strange stuff in the logs, double check your permissions.

Everything should work as before, when you had JGroups set to use TCPPING. To test, you could create jobs scheduled to run on only one node of the cluster, send messages to queues and topics, and check the contents of your distributed caches on the nodes using nrepl.

Jim Crossley put up a really great Overlay Screencast a couple of months ago that demonstrate Ruby and Clojure apps interacting. If you’ve not played with message queues or polyglot systems, this is a good place to get started.

In the next AWS / Immutant post, I’ll provide details on my mod_cluster / Apache configuration for load balancing. I’ll also talk a bit about my Rails / Clojure interaction, and how my AMIs are configured to pull configuration from GitHub.