TvE 2100

At 2100 feet above Santa Barbara

Generating Signed URLs to Access Amazon S3

One of the cool aspects of Amazon S3 is that public files stored on S3 can be accessed directly from a web browser. What I find even more cool is that it is possible to created signed URLs to private files stored on S3 that enable access for a time-period that is “baked-in” to the signature. Say I have a bunch of private files and want to give you access for 7 days, I can easily produce URLs to these files that give you exactly that.

Something that is not immediately obvious from the Amazon documentation is that it is also possible to store files on S3 using signed URLs. In other words, I can create a URL that allows someone to store a file at that URL for a pre-defined time-period. Why would I want to do this? Well, I actually use it a lot not to give someone else access but to be able to put files onto S3 using a standard command line HTTP client program called curl. The reason for using curl to put files onto S3 is performance and retries: curl will max out the bandwidth, it has the ability to perform retries built-in, and it will happily churn out a huge file up to the 5GB limit of S3. So often when I write a script to upload a bunch of stuff to S3 I prefer using curl over one of the Ruby S3 libraries.

The main annoyance in using curl for me has been the need to generate signed URLs: I’ve had to load the whole Amazon S3 library in order to use just 1% of it to generate signed URLs. Yesterday I finally rolled-up my sleeves and extracted a bunch of code from one of the libraries to produce a 50-line class that generates signed URLs and is stand-alone, i.e., doesn’t require additional HMAC or other libraries (it uses openssl directly, which is standard in ruby).

So without further ado, here is the class:

require 'digest/sha1'
require 'openssl'
require 'cgi'
require 'base64'

## The S3Sign class generates signed URLs for Amazon S3
class S3Sign
  
  def initialize(aws_access_key_id, aws_secret_access_key)
    @aws_access_key_id = aws_access_key_id
    @aws_secret_access_key = aws_secret_access_key
  end

  # builds the canonical string for signing.
  def canonical_string(method, path, headers={}, expires=nil)
    interesting_headers = {}
    headers.each do |key, value|
      lk = key.downcase
      if lk == 'content-md5' or lk == 'content-type' or lk == 'date' or lk =~ /^x-amz-/
        interesting_headers[lk] = value.to_s.strip
      end
    end
    
    # these fields get empty strings if they don't exist.
    interesting_headers['content-type'] ||= ''
    interesting_headers['content-md5'] ||= ''
    # just in case someone used this.  it's not necessary in this lib.
    interesting_headers['date'] = '' if interesting_headers.has_key? 'x-amz-date'
    # if you're using expires for query string auth, then it trumps date (and x-amz-date)
    interesting_headers['date'] = expires if not expires.nil?
  
    buf = "#{method}\n"
    interesting_headers.sort { |a, b| a[0] <=> b[0] }.each do |key, value|
      buf << ( key =~ /^x-amz-/ ? "#{key}:#{value}\n" : "#{value}\n" )
    end
    # ignore everything after the question mark...
    buf << path.gsub(/\?.*$/, '')
    # ...unless there is an acl or torrent parameter
    if    path =~ /[&?]acl($|&|=)/     then buf << '?acl'
    elsif path =~ /[&?]torrent($|&|=)/ then buf << '?torrent'
    end
    return buf
  end
  
  def hmac_sha1_digest(key, str)
    #STDERR.puts "SIGN: #{str}"
    OpenSSL::HMAC.digest(OpenSSL::Digest::SHA1.new, key, str)
  end
  
  # encodes the given string with the aws_secret_access_key, by taking the
  # hmac-sha1 sum, and then base64 encoding it. then url-encodes for query string use
  def encode(str)
    CGI::escape(Base64.encode64(hmac_sha1_digest(@aws_secret_access_key, str)).strip)
  end
  
  # generate a url to put a file onto S3
  def put(bucket, key, expires_in=0, headers={})
    return generate_url('PUT', "/#{bucket}/#{CGI::escape key}", expires_in, headers)
  end
  
  # generate a url with the appropriate query string authentication parameters set.
  def generate_url(method, path, expires_in, headers)
    #log "path is #{path}"
    expires = expires_in.nil? ? 0 : Time.now.to_i + expires_in
    canonical_string = canonical_string(method, path, headers, expires)
    encoded_canonical = encode(canonical_string)
    arg_sep = path.index('?') ? '&' : '?'
    return path + arg_sep + "Signature=#{encoded_canonical}&" + 
           "Expires=#{expires}&AWSAccessKeyId=#{@aws_access_key_id}"
  end

end

The method to generate a signed HTTP PUT request is S3Sign.put and you can easily add additional methods for other S3 requests. Now to the curl part, here’s a fragment to upload a file using curl, and make it publicly accessible on S3:

@@s3_sign = S3Sign.new(..., ...) # insert your S3 credentials

def s3_curl_put(filepath, bucket, uri)
  headers = { 'x-amz-acl' => 'public-read' }
  url = @@s3_sign.put(bucket, uri, 600, headers)
  heads = headers.map{|k,v| "-H '#{k}: #{v}'"}.join(' ')
  cmd = "curl #{heads} -s -f --upload-file #{filepath} 'https://s3.amazonaws.com#{url}'"
  ret = `#{cmd}`
  code = $? >> 8
  exit code unless code == 0
end

Obviously you will want to customize the error handling to suit your needs, and often additional options to curl are helpful.

With all this in place, uploads to S3 from an EC2 instance zip along at 10MBytes/sec and uploads of multi-GB files are no problem (except that they do take a while…). We recently put an upload server into production that allows web browsers to upload files to S3 under the control of a separate web server. It uses Mongrel and Merb but when it comes to uploading to S3 I use curl and that did help performance. I hope the above code fragments can help you too improve your S3 uploads.