17 March 2016

Written by Jon Bevan

Preamble

Recently I was working on a project where we had a JIRA add-on and a Fisheye/Crucible (Fecru) add-on that communicate via applinks and the Fecru add-on was performing slowly.

A colleague had used Gatling to run performance tests before and as a company we've started using Arquillian a lot more to run integration tests. I figured there must be a way to combine those two technologies with the Atlassian SDK to let me run a single command and have JIRA & Fecru startup, run an Arquillian test (which would configure the apps and populate them with data) and then run the Gatling test, giving me nice JUnit output for my Bamboo build.

Here's what I learnt along the way!

1. Gatling tests aren't very straightforward to execute programmatically

I was hoping Gatling would have some lovely Java API that I could just call from within an integration test so that this would just all work but that was wishful thinking. The official documentation says to use a bundled shell script that builds up a classpath using a bunch of environment variables and the find command on *nix. Alternatively, I could also use the Maven plugin to kick off the tests, but running Maven from within an integration test didn't feel quite right...

Side-note: If the Atlassian SDK allowed us more fine grained control over integration tests that would definitely have helped here. As far as I know, there isn't a pre-integration-test phase or a post-integration-test phase that would allow us to run things before/after the apps have started and before/after they get terminated.

2. CTRL-D does not send a signal

By this point I'd resigned myself to writing some kind of bash script to do the job and so my initial bash (pun intended) at a solution looked something like this:

#!/bin/bash
atlas-debug --instanceId jira -pl performance-tests -DskipTests 2>&1 >jira.log &
atlas-debug --instanceId fecru -pl performance-tests -DskipTests 2>&1 >fecru.log &

What's not to like?

When I ran this script each application would start up, and then immediately shutdown without any prompting or interaction or anything. It turns out, after a bit of digging and manic Googling research, that this is all to do with I/O streams.

For those who don't already know, on *nix systems, each process has three I/O streams by default: stdin, stdout and stderr. These are used to direct the input into that process and two kinds of output (regular output and error messages). When I run atlas-debug from my terminal/console stdout gets set to my terminal and I see a whole load of Maven output about all the things its downloading. Interestingly, stdin also gets set to my terminal - i.e. it reads whatever I type on my keyboard.

The funny 2>&1 and >jira.log stuff in the bash script above allows me to redirect the output of the atlas-debug process into a file called jira.log. The trailing & tells bash to run the preceding command in the background. Now, the important thing to note here is that when a process is told to run in the background it no longer has a stdin stream associated with it.

OK, enough rambling. What does that have to do with anything?

I mistakenly assumed that, like CTRL-C or CTRL-Z, CTRL-D sent a signal to the currently attached process (the one with stdin associated with my terminal) and that signal is what the process responded to in order to shutdown. In particular I'm talking about atlas-debug here and the nice way it says to:

[INFO] Type Ctrl-D to shutdown gracefully
[INFO] Type Ctrl-C to exit

It turns out that CTRL-D is actually the magic key combination for the End Of Transmission (EOT) character, which is ASCII char 4 and Unicode U+0004.

Having run man ascii before I'm not sure how I missed that... </sarcasm>

Anyway, that means that CTRL-D sends a kind of "end of stdin" message to a process, and when I was running atlas-debug in the background (and therefore stdin was being detached) it was receiving the same "end of stdin" message and shutting JIRA and Fecru down when they'd finished starting up.

While it is convenient to be able to gracefully shutdown an Atlassian application with a couple key presses, this implementation using CTRL-D means that its not possible to send an interrupt signal to trigger a graceful shutdown which makes automating the startup/shutdown of an application harder.

Side-note: Having just looked through the source of avst-app (Adaptavist's 'package manager' for Atlassian Applications) it appears you can do graceful shutdowns via tomcat.

3. Screen is awesome

So now that I understood why I couldn't just run JIRA and Fecru as background processes I turned my attention towards an application called screen that I'd heard of before but never actually used. Here's some excerpts from the man page:

Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells)... When screen is called, it creates a single window with a shell in it (or the specified command) and then gets out of your way... All windows run their programs completely independent of each other. Programs continue to run when their window is currently not visible...

Sounds perfect for my use case here. A few command line options later I had modified my earlier script as follows:

#!/bin/bash
screen -d -m -s /bin/bash -L atlas-debug --instanceId jira -pl performance-tests -DskipTests
screen -d -m -s /bin/bash -L atlas-debug --instanceId fecru -pl performance-tests -DskipTests

Throw in a little extra grep magic to check the screen output log for "jira started successfully" and "fecru started successfully" and we've got something we can work with.

Once I knew the applications had started I could run atlas-mvn verify -pl performance-tests -DnoWebapp=true to run my Arquillian test that would populate both applications with data, swiftly followed by atlas-mvn gatling:execute -pl performance-tests to run my Gatling tests.

Side-note: I'm pretty sure a similar solution is possible using named pipes, but I couldn't quite get it working as I expected it to...

4. Gatling literally just implemented JUnit output

At this point I had a script that looked a little like this (I've omitted some of it for brevity):

#!/bin/bash
screen -d -m -s /bin/bash -L atlas-debug --instanceId jira -pl performance-tests -DskipTests
screen -d -m -s /bin/bash -L atlas-debug --instanceId fecru -pl performance-tests -DskipTests

# Omitted checks here for ensuring the processes are running and obtaining their process IDs

while [[ -z "`grep "fecru started successfully" screenlog.0`" || \
        -z "`grep "jira started successfully" screenlog.0`" ]]; do
    sleep 5
done

atlas-mvn verify -pl performance-tests -DtestGroups=perf -DnoWebapp=true

atlas-mvn gatling:execute -pl performance-tests

But I had originally wanted to run this in a Bamboo build, so I needed the Gatling test output in a useful format. A quick Google Github later and I found this closed issue that provided JUnit output format to Gatling tests based on the assertions specified for the simulation/test. Yay!

5. Being a good Bash citizen

Whilst developing the script I made extensive use of the -x flag in my bash script:

#!/bin/bash -x
...

I don't know of many ways to debug bash scripts, but this flag is great as it prints out all the lines that are being executed with variables substituted for their values so you can actually see what is being executed and where the script is bombing out or failing to do what you expect.

Additionally, there are a few other options I used to make my script 'safer' to run - namely -u (treat unset variables as an error) and -o pipefail (return error code if any commands in a pipeline error, instead of only the last command).

Finally I figured I should probably handle interrupts cleanly and shutdown JIRA and Fecru if the script was terminated itself. It turns out this is fairly easy to do with the following snippet of code which I stuck at the top of my script:

#!/bin/bash -u -o pipefail

# High default PIDs for the kill signal trap, these will be overwritten once JIRA and
# Fecru processes have been started
JIRA_PID=1000000
FECRU_PID=1000000

function killApps {
    echo "Interrupted, terminating JIRA and Fecru"
    kill -HUP ${JIRA_PID} ${FECRU_PID}
    exit 1
}

trap killApps SIGHUP SIGINT SIGTERM


blog comments powered by Disqus