TvE 2100 » At 2100 feet above Santa Barbara
Bosque del Apache National Wildlife Refuge, Feb 2004, ©2004 Thorsten von Eicken 

Ruby-prof does not play nice with threads

Posted: January 16th, by tve
Tags: #

… or so it seems! I was trying to profile a worker running in BackgrounDRb and I kept getting errors almost as soon as I’d call RubyProf.start looking like this:


./script/../vendor/rails/activerecord/lib/../../activesupport/lib/active_support/inflector.rb:108: warning: ruby-prof: An error occured when leaving the method Inflector#singularize.
   Perhaps an exception occured in the code being profiled?

Each time the error would occur in a slightly different place. I finally disabled all but one worker (thread) and that seems to have fixed the problem.

Update: it seems that it is possible to use ruby-prof with threads, but it looks like one has to start the profiling before forking off the threads.

Verifying SSL certs when using Net:HTTP

Posted: December 25th, by tve
Tags: #

What good is an HTTPS web service if you’re not verifying the certificate presented by the web service? That’s what I was wondering when I started to use the Amazon Elastic Compute Cloud web service and used a sample that used simple HTTP. So I looked into the docs and quickly found that the trick is to set http.verify_mode = OpenSSL::SSL::VERIFY_PEER when creating the https connection object. Unfortunately that turned out not to be so simple, because the only effect I got is an error on every connection attempt telling me that the peer’s certificate cannot be validated. Very useful!

Black-necked Stilts
Black-necked Stilts, Devereux Slough, Santa Barbara, CA ©2005 Thorsten von Eicken

Of course my next step was a Google search, but after a long time all I found is that everyone turns the verification off! I then proceeded to look at the source code to figure out what is going on, and I finally gave up after a couple of hours. Finally, the Pragmatic Studio alumni mailing list came to the rescue: Devin Mullins gave me the critical tip that made it work: one has to give the certificate file a special name, duuuh. Here is in detail what I did to get it to work:

I have a file ‘cacert.pem’ that has all the root certs I care about. I ran the following command:

# openssl x509 -hash < cacert.pem
f73e89fd
-----BEGIN CERTIFICATE-----
MIICNDCCAaECEAKtZn5ORf5eV288mBle3cAwDQYJKoZIhvcNAQECBQAwXzELMAkG
A1UEBhMCVVMxIDAeBgNVBAoTF1JTQSBEYXRhIFNlY3VyaXR5LCBJbmMuMS4wLAYD
VQQLEyVTZWN1cmUgU2VydmVyIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MB4XDTk0
MTEwOTAwMDAwMFoXDTEwMDEwNzIzNTk1OVowXzELMAkGA1UEBhMCVVMxIDAeBgNV
BAoTF1JTQSBEYXRhIFNlY3VyaXR5LCBJbmMuMS4wLAYDVQQLEyVTZWN1cmUgU2Vy
dmVyIENlcnRpZmljYXRpb24gQXV0aG9yaXR5MIGbMA0GCSqGSIb3DQEBAQUAA4GJ
ADCBhQJ+AJLOesGugz5aqomDV6wlAXYMra6OLDfO6zV4ZFQD5YRAUcm/jwjiioII
0haGN1XpsSECrXZogZoFokvJSyVmIlZsiAeP94FZbYQHZXATcXY+m3dM41CJVphI
uR2nKRoTLkoRWZweFdVJVCxzOmmCsZc5nG1wZ0jl3S3WyB57AgMBAAEwDQYJKoZI
hvcNAQECBQADfgBl3X7hsuyw4jrg7HFGmhkRuNPHoLQDQCYCPgmc4RKz0Vr2N6W3
YQO2WxZpO8ZECAyIUwxrl0nHPjXcbLm7qt9cuzovk2C2qUtN8iD3zV9/ZHuO3ABc
1/p3yjkWWW8O6tO1g39NTUJWdrTJXwT4OPjr0l91X817/OWOgHz8UA==
-----END CERTIFICATE-----

The trick is that the file basename must be the hash printed out by the above command plus an appended .0! So my next move was:

# mv cacert.pem f73e89fd.0

I then adjusted my code as follows:

  http = Net::HTTP.new(link.host, link.port)
  if link.scheme == 'https'
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.ca_file = "#{RAILS_ROOT}/lib/ec2/f73e89fd.0" 
  end
  http.start
  response = http.get(link.request_uri)

This now works like a charm! Phew!

Update:

Upon further inspection, it looks like the above only works for the first certificate in my original cacert.pem file and I was lucky that that’s the one I need for EC2. What I need to do to use all the certs in the file is break it up, one cert per file, use openssl to figure out the filenames, and then use http.ca_path instead of http.ca_file to point to the directory with all the cert files.

Course on Scalable Internet Services (in Ruby on Rails)

Posted: December 20th, by tve
Tags: #

Phew, I just finished teaching a graduate course on building scalable internet services at UCSB (Univ. of California At Santa Barbara). This was a very hands-on, learn-by-doing course with a significant project in Ruby on Rails! The goal was for the students to learn about all the technologies that go into a scalable internet service, specifically into dynamic web sites. The lectures provided background information to support the project and they explained technologies beyond the scope of the project.

Hummingbirds
Anna’s Hummingbirds, Joshua Tree, CA, Mar 2005, ©2005 Thorsten von Eicken

The project consisted of building a transactional dynamic web site in Ruby on Rails and running on Amazon’s Elastic Compute Cloud (EC2). Each site had to hold >100’000 database records that could be searched and explored, have user accounts, and include some form of transaction, such as a shopping cart check-out.

Each project then had to be deployed on multiple servers on EC2 and the groups had to use httperf to demonstrate that they could scale the performance of their site by running a front-end load balancer server, a database server, a memcached server, and up to 10 application servers. All this had to fit into a 10-week quarter, with none of the students knowing either Ruby or Rails at the outset!

Note that the emphasis of the course was on the scalability aspect of the sites and not on the web design or feature-set. Thus it was more important to understand the performance characteristics and optimize the core of the site than to have the most eye-candy. (Although eye-candy is always appreciated, of course…)

Please check out the course wiki for more information, including all course materials!

Accessing inner Java classes via Rjb

Posted: December 3rd, by tve
Tags: #

The entry below is copied straight from another blog. I ran into the problem described therein a few minutes ago and found this solution via Google’s cache. The origin server doesn’t respond, so who knows, this may be lost as soon as it pops out of the cache, so I am duplicating it here. I also can’t tell who wrote it, just that it came from http://doodle.barelylegible.com/blog/?cat=5. So here we go:

== RubyJavaBridge is nice == Sunday, February 12th, 2006

I’ve been playing with the RubyJavaBridge that was talked up at the last Ruby get-together and so far it is indeed very swank.

I couldn’t resist investigating the issue Les had with accessing static inners and I think I found the syntax. Accessing inner classes (static or not) can look a little wonky but it is doable. Statics are loaded like any other class, but their pathname is ‘OuterClass$StaticInnerClass’. The nonstatic inner classes are a tiny bit trickier. Import like the static, with ‘OuterClass$Inner’; now you have the inner class, but the trick is in instantiating an instance: you must provide an OuterClass instance as the first argument to the constructor (thus revealing a little behind the curtain of java the implicit access an inner has to its outer’s methods and data):

Outer = Rjb::import(‘Outer’)
Inner = Rjb::import(‘Outer$Inner’)
StaticInner = Rjb::import(‘Outer$StaticInner’)

outer = Outer.new
inner = Inner.new(outer)
staticInner = StaticInner.new

I have full sources and the example output below.

I have so many ways I want to use this bridge. This is the key I need to access the AS400 from Ruby. Tasty tasty.

[Sorry, the “full sources and examples” are not in the Google cache…]

Optimizing a Rails application, part 1

Posted: December 2nd, by tve
Tags: #

I’m working on a really cool Rails application called AWS-Console which manages servers on Amazon’s Elastic Compute Cloud. With the click of a button, we can instantiate new servers which fire up within a couple of minutes. We can even take a svn repository of a Rails app and launch it fully automatically, takes about 10 minutes to come up. Our test uses mephisto: want 10 instances of mephisto running? Clicke here…

So I was measuring the performance of the AWS-Console site and it turned out to be abysmal! Like 1 request per second max, running apache+mongrel_cluster on a 2Ghz/2GB box. That was rather disappointing. So I rolled up my sleeves to try and figure out what is going on, and this is the tale of my first Rails performance adventure!

Sandhill Cranes
Sandhill Cranes, Bosque del Apache, NM, Feb 2004, ©2004 Thorsten von Eicken

The goal

Up to now we’ve been exclusively focused on functionality and have not cared a bit about performance. But at some point one does have to make sure that performance is in the ballpark. I’ve read Stefan Kaes’ excellent blog with his many suggestions on how to improve performance. So I have about 20 things in mind that I could “fix” and hope that they improve performance. But I don’t like working blindly, so I decided to follow the conventional procedure:

  1. benchmark the application to get baseline measurement(s)
  2. profile the application to figure out where the time is going
  3. optimize the bottlenecks using Stefan’s tips or things I figure out on my own
  4. re-benchmark the application to see whether I actually improved things
  5. go back to step 1 until satisfied
  6. learn from what I did so I don’t have to do it all over for the next app/feature

I’m planning to write up the whole story here, this is just the first part. I hope this will be useful to others and I hope it will jog my memory the next time I’m going through this :-).

Part 1: benchmarking

For the benchmarking I am using httperf which is an excellent program to apply realistic load to a web site. The reason I like httperf is that it decouples request generation from server responses, which means that it can continue opening new concurrent connections to the server no matter how fast the server responds. This is as it happens in real life: users come to the site no matter how slow the server is and only as they browse around they get slowed down by the server’s response time. The bottom line is that this allows httperf to overload the server and really drive it against the wall. Httperf can also take little scripts of URLs which describe a flow through the site and it will open a connection and then “walk through” the URLs one at a time, and if the site has images, it can request the images for a page in a burst. So all-in-all it’s a good and realistic benchmarking tool.

The downside of httperf is that the URL scripts are very primitive, and in particular if the site requires authentication, then it’s tedious to have httperf log in as a different user every time it opens a connection.

Creating a workload

To get started I downloaded httperf from HP’s site and installed it. I don’t remember the details, but it seemed pretty straightforward.

The key to using httperf as described above where it walks through a sequence of URLs is the –wsesslog workload generator. If you want to play with it, you will need to check out the man page for all the details, but here is what I did. The first thing was to create a file with the workload. My first test of the AWS-console looked as follows:

/
/sessions/new
  /javascripts/prototype.js?1160194386
  /javascripts/effects.js?1160194386
  /javascripts/dragdrop.js?1160194386
  /javascripts/controls.js?1160194386
  /javascripts/application.js?1154539549
  /javascripts/niftycube.js?1160194386
  /stylesheets/yui/reset.css?1157476643
  /stylesheets/yui/fonts.css?1157693397
  /stylesheets/yui/grids.css?1160194386
  /stylesheets/syslitics.css?1162714447
  /stylesheets/niftyCorners.css?1160194386
  /images/loading.gif?1154732591
/favicon.ico
/sessions method=POST contents='email=test@test.com&password=test'
/
/ec2_instances
/ec2_images
/ec2_instances
/ec2_images
/ec2_instances
/ec2_images
/ec2_instances
/ec2_images

This file fetches “/”, which redirects to “/sessions/new” and which is followed by fetching all the javascripts and stylesheets referenced by the response. Then comes the favicon and that is followed by a POST of the login information. Then it fetches “/” because the login redirects there, and then a bunch of fetches of the instances and images pages.

The way I came up with this list is to start a fresh browser and navigate to the site, and then look at the server log in /var/log/httpd/access_log (or similar) and grab all the URIs listed there. I had to make-up the contents of the post myself, but it’s simply a URL-encoded string of the various form fields.

Note that httperf does handle cookies in a simple but effective manner. At the first request, the site will return a session cookie to httperf which the latter dutifully presents with all subsequent requests. So a site where the user has to login before proceeding actually does work correctly. Nice!

Then I tested this using httperf with the following command line:

httperf --hog --server test.aws-console.com --wsesslog=1,0,aws-test1 \
--session-cookie -ssl --print-reply

The –hog option has to do with sockets and is important but not interesting, –session-cookie enables cookie handling as described above, –server is the address of the server, –print-reply prints out all the responses from the server so I can check that the proper pages get returned and not some errors, –ssl uses https (aws-console is an SSL site), and –wsesslog references the workload file. The –wsesslog options signify that httperf should run through the workload file once, and delay 0 seconds between URLs (to simulate user think time, which I’m not interested in), and aws-test1 is the filename of my workload. Printing out the replies means lots of stuff to scroll through, but it allowed me to check that the login and the other page fetches worked properly.

Applying some load

Then a quick test running 30 sessions and starting a new session every two seconds:

httperf --hog --server www.aws-console.com --wsesslog=30,0,aws-test1 --session-cookie --rate 0.5

To produce a graph, I would vary –rate from about 0.1 to 5 (ramp starting a new session from one every 10 seconds up to 5 per second). Good numbers for your app will depend on the performance that you see.

Well, the result I got was a whopping 1.2 replies per second on average!!! Given that almost half the requests are for static pages (the style sheets and java scripts) which are served blindingly fast by apache, I can only say “abysmal!”

With this poor performance there really isn’t much point in generating a graph that shows reply rate as load is increased or load vs. response time.

Something I did is to add more sessions to my description file so that I can exercise different use cases (flows through the site). I though of automatically generating variants of the workload file where I change the login id and perhaps some other post parameters, but I haven’t implemented that yet.

Note that httperf has a “-v” option which is helpful in seeing progress. One can specify a very large number of sessions in –wsesslog and then watch httperf print out its requests/sec measurements every 5 seconds and after one to two dozen measurements, hit ctrl-C to get the overall stats.

Figuring out where the time is going

So, why doesn’t it do more requests per second than it does? That’s an excellent question for part2, to appear soon!

RadRails no-go

Posted: August 25th, by tve
Tags: #

Up to now I’ve been using JEdit as my editor when programming RoR apps. Nice, but not quite there. Also, would be nice to have an IDE, me thinks. So I finally gave RadRails a spin. This is on a windows box, so no TextMate… I’ll switch to a Mac when Vista forces me to upgrade one way or the other. Tangent. Back to what I found out after two days of trying RadRails.

The first thing that drove me nuts is that RadRails doesn’t support drag&drop cut&paste of text. I’m so used to selecting text and then dragging the selection in order to move the text that I just can’t do without. I can’t believe that it’s not supported, it just can’t be true, yet I haven’t found a way to “enable” such a feature. Please let me know if there is a solution!

Then I found out that I don’t seem to be able to split the editor window, e.g. to see two files at the same time. It’s easy to switch between the most recent few files using the tabs at the top, but I often really want to see two or three files tiled vertically on top of one another. Ouch!

I love being able to stop & restart the webbrick server at the bottom, and the tailing of the log file is also often handy. But I don’t see any instructions on how to set things up so I can run rails in the debugger and set breakpoints. Maybe that’s not possible, dunno. So I haven’t seen any way to turn all these nifty debugging buttons from being teasers to becoming actually useful.

I’m coming to the conclusion that at this point RadRails is a no-go and am switching back to JEdit. Maybe in a few months RadRails will have improved sufficiently to give it another spin…

Rails list / show / edit variations

Posted: August 17th, by tve
Tags: #

Having tried the standard Rails scaffold and now the new Streamlined scaffold, I feel like I’m back to square one in terms of designing the interaction model for my app. It seems it all boils down to how one handles handles list / show / edit, in particular where AJAX edit in place is possible and where it isn’t. There are only a few combinations possible, but unfortunately both the standard Rail scaffold as well as Streamlined only implement one of them, so it makes it difficult to move stuff around or to build an app that uses more than one model of interaction.

The standard Rails scaffold doesn’t use AJAX at all. It’s model is as follows:

  • List - Table with 1 row per object, buttons for show / edit / delete per row, new (create) button at end of table
  • Show - Table with 1 row per attribute, buttons for back and edit at the bottom
  • Edit - Form table similar to show, but with editable forms, buttons for save and cancel at the bottom
Streamlined takes a different approach and basically operates entirely from within the list view. Show and edit are handled within pop-up layers:
  • List - Table with 1 row per object, buttons for show / edit / delete per row, new button at end of table
  • Show - Pop-up table with 1 row per attribute, edit button at end, and “window” close/minimize/maximize buttons in the pop-up window title bar
  • Edit - Uses same pop-up window as show and transition between show and edit is smooth (but does require server request), buttons for save and show at the bottom.
A nice aspect of the streamlined show/edit pop-up windows is that it is possible to show multiple elements at the same time.

The ajax scaffold seems to be similar to Streamlined in that it operates entirely out of the list view. I haven’t tried it, so I’m not 100% sure. From what I can gather it operates as follows:
  • List - Table with 1 row per object, buttons to edit and delete, button for new (create) at the top.
  • Show - no such view, it’s all shown in the list table
  • Edit - Creates an in-place form by increasing the (vertical size of the row and placing a form into the row with buttons for save and cancel.
The mode I’d like to have at the moment is different from the above. The primary reason is that I have a lot of information for each object, including images, thus maing it impractical to show everything in a list view. What I’d like is:
  • List - Table with 1 row per object, showing only selected attributes, click on row to show, button at top for new
  • Show - Page with 1 row per attribute, click on any attribute to switch to edit, button at top for delete
  • Edit - Seamless in-place transition between show and edit, buttons for save and cancel at the bottom
The second reason I’d like to have a full page for show, instead of the pop-up that Streamlined creates, is that I have a bunch of many-to-many relationships for which I’m still trying to find a good editing model. I am not using the Rail HABTM at all and am instead creating explicit models for the relationships because each of these relationships has attributes on its own. What I’d like to happen with these relationships in the views of the entities that are being related is the following:
  • List - There is no list view for a relationship on its own, but there is a list partial that can be included in a cell in the list view of either of the entities being related
  • Show - There is no show view for a relationship on its own, but there is a show partial that is included in the show view of either of the entities being related. The show partial is really more like a list view in that it shows a table with one row per relationship.
  • Edit - Again, there is no edit view for a relationship on its own, but there is an edit partial very similar to the show partial that is intended to be included in the edit view of either of the entities being related. The edit partial allows inline editing of the relationship attributes, and it has a delete button for each row. At the bottom it has a drop-down select box to add new relationships.
Ok, I really need to add some pictures to explain this. But actually the real point I want to make is not that the way I’d like to structure the list / show / edit interaction is better, but that I need the flexibility. So what I’d like is the scaffold to offer more flexibility and generate more possibilities for me: don’t try to lock me into one interation model. Either give me a bunch of options to pass to the generator ot generate a whole bunch of partials that I can combine in different ways.

Installing Ruby on Rails on a Rimuhosting Fedora Core machine

Posted: August 7th, by tve
Tags: #

Here I go for my second Ruby on Rails install. This time it’s on a “virtual private server” at Rimuhosting

Installing a whole bunch of ruby stuff:

yum install ruby ruby-libs ruby-mode ruby-rdoc ruby-irb ruby-devel ruby-docs

Installing gems (check for latest version on RubyForge)
wget http://rubyforge.org/frs/download.php/11289/rubygems-0.9.0.tgz
tar zxvf rubygems-0.9.0.tgz
cd rubygems-0.9.0
ruby setup.rb

I installed rails:

gem install rails --include-dependencies

Now mySQL, which was installed but needed a bit of config:

chkconfig --add mysqld
chkconfig --level 3 mysqld on
/etc/init.d/mysqld start
/usr/bin/mysqladmin -u root password 'somethingsmart'
/usr/bin/mysqladmin -u root -h graphs.voneicken.com password 'somethingsmart'

Now we’re ready for the ruby mysql interface. First install the mysql-devel package (Rimu uses apt on FC5, not yum):

apt-get install mysql-client

and then the ruby interface itself:

gem install mysql -- --with-mysql-dir=/usr/lib/mysql --with-mysql-config=/usr/bin/mysql_config

Now we’re ready for mongrel!

gem install mongrel

did it’s job without complaining. What a relief!

AJAX toolkits for use with Ruby

Posted: July 5th, by tve
Tags: #

This is a set of random notes taken while looking at various AJAX toolkits for use with Ruby/Rails.

  • Dojo looks rather nice with high-level widgets. Apparently it’s pretty large. Got layouts, tables, rich edit fields, etc. How to integrate with Ruby?
  • Script.aculo.us doesn’t seem to do much, if anything, that the stuff built into Rails (Prototype) doesn’t already do.
  • DHTMLgoodies has lots of stuff including a nice unobtrusive sortable table. It appears to be more of a collection of scripts than a toolkit per se.
  • Another sortable table at Pascarello.com
  • Adobe spry might be interesting, seems too early to really tell.
  • Yahoo has a UI widget set with a number fo interesting things.
  • Google has a web toolkit which is a “Java software development framework that makes writing AJAX applications like Google Maps and Gmail easy for developers who don’t speak browser quirks as a second language”.
  • Another sorted table can be found here.
  • Rico is another open source javascript library and has an asynchronous loading scrollable and sortable table.
Meta pages about AJAX:

Ruby on Rails online docs

Posted: June 2nd, by tve
Tags: #

http://wiki.rubyonrails.org/railsHere are some quick pointers to references I use a lot: