My adventures with tracking down a database pool leak
This page was first published on August 20th 2014 and was last updated on September 7th 2014. See the update below.
Occasionally I still work on the OpenAM instance of Kir-Dev because they are transitioning to a custom OAuth2 provider solution and they need support for its plugins I wrote.
Few words about my use-case
My latest addition was a bigger refactor and an OAuth2 AttributeMapper implementation. With this plugin users are able to sign in on federated sites which still uses OpenAM, through the new OAuth2 provider.
With the refactor, I introduced a bug which resulted outages caused by the exhaustion of the database connection pool on Tomcat (the container we use for OpenAM). Make things worse, it wasn’t a trivial leak: it happened quite often (few days) but not after some specified number of logins.
DoS it!
Okay, not exactly. I designed an experiment to generate thousands of log in operations with the help of Selenium, PhantomJS and JMeter. I learned a lot on the way - just for a hint from where I started: control Chrome windows with Selenium from a Ruby loop. I improved my setup from 15 ops/min to 50+ ops/min. I ended up with the following:
Run Selenium as standalone with PhantomJS driver
$ export PATH=$(echo $PATH):/home/balo/apps/phantomjs-1.9.7-linux-x86_64/bin
$ java -jar selenium-server-standalone-2.42.2.jar
Ruby script which simulates a simple login flow
# encoding: utf-8
require 'selenium-webdriver'
OPENAM_LOGIN_URI = "https://HOSTNAME/opensso/UI/Login?service=oauthchain"
CAPS = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.cli.args" => ['--ignore-ssl-errors=true'])
def cleanup(driver)
driver.manage.delete_all_cookies
driver.quit
end
def login_process(driver)
driver.get OPENAM_LOGIN_URI
# OpenAM automatically redirects to the login form of the custom OAuth2 provider
user_field = driver.find_element id: "LoginForm_username"
user_field.send_keys "<<username>>"
pass_field = driver.find_element id: "LoginForm_password"
pass_field.send_keys "<<password>>"
pass_field.submit
sleep 1
# Handle the case when we have to give permission to the application
if driver.title.eql? "Authorize application"
driver.find_element(name: "authorized").click
end
wait = Selenium::WebDriver::Wait.new(timeout: 20)
begin
wait.until {
# OpenAM login url or in case of success, the protected app
driver.current_url.start_with? OPENAM_LOGIN_URI or driver.current_url.include? "myprotectedapp.com"
}
# On failure it will redirect to the OpenAM login url which displays the error
raise "gotcha: #{driver.title}" if driver.current_url.start_with? OPENAM_LOGIN_URI
ensure
cleanup(driver)
end
end
driver = Selenium::WebDriver.for :remote, url: "http://127.0.0.1:4444/wd/hub", desired_capabilities: CAPS
login_process driver
Call the script above with JMeter OS Process Sampler a few thousand times on 4 threads and wait for failed login operations
Command: /home/balo/.rbenv/versions/2.1.2/bin/ruby
Parameter: /home/balo/code/src/github.com/vbalazs/vir-auth/do-logins.rb
About the bug
I implemented my database connection class as an AutoCloseable. My mistake was that I wanted to be smart. I made lazy connection initialization (why?!) and connection validation to make sure I handle every possible corner case in the universe. A friendly suggestion: don’t do it.
I simplified the code, from now on it initialises the connection in the constructor and fail early if something went wrong.
Update (07 Sept 2014): @aldaris pointed out a bug I introduced with my fix. You can see his PR here. Thanks!