Tuesday, June 16, 2009

Railscasts color theme for Emacs

I definitely like Railscasts — free and very interesting Ruby on Rails screencasts. One of the things that is really awesome about them is a TextMate color theme Ryan Bates uses, so I always wanted an Emacs version. And today I finally created it for myself. Meet color-theme-railscasts!

Thursday, May 7, 2009

Caching rarely updated data with Rails

Let's say you have an external query for some data in your Rails application, like search results from Yahoo or Google, or some feed, and it could take anywhere from 1 to 5 seconds to fetch it. Of course, it makes sense to cache it, especially if the pages that use this external data need to be accessed frequently. One obvious solution would be to fragment cache it (using memcache store), but it could be not optimal if some of the following cases apply:

  • you need to cache a LOT of data, like millions of results, and it would take up too much memory;
  • you only need to update this data like once per month, so it doesn't make sense to keep it hanging in memory for that long;
  • you don't want this data to be displaced with something else when memcached runs out of memory or lose everything when you're restarting memcached;
  • you want to be able to change the look and feel of your external data on the page without expiring all your fragment cache related to this data.

That was exactly the case in our project, so I came up with a very simple but effective solution: store everything in the filesystem cache and fetch results using Rails cache mechanism wrapped in a small class.

First, we need to be able to auto-expire file cache entries. It can be easily achieved by extending ActiveSupport::Cache::FileStore:

class FileStoreWithExpiration < ActiveSupport::Cache::FileStore

  def read(name, options = nil)
    expires_in = options.is_a?(Hash) && options.has_key?(:expires_in) ? options[:expires_in] : 0
    file_path = real_file_path(name)
    return if expires_in > 0 && File.exists?(file_path) && (File.mtime(file_path) < (Time.now - expires_in))
    super
  end
  
end

Then we just plug this store into our external data fetch wrapper:

class ExternalSearch
   RESULTS_CACHE      = FileStoreWithExpiration.new("tmp/cache/external_searches")

  def initialize(query)
    @query = query
  end

  def results
    RESULTS_CACHE.fetch("#{cache_key}", :expires_in => 30.days) do
      parse_search_results("http://example.com/fetch.xml?query=#{URI.encode(@query)}")
    end
  end

  def parse_search_results(source)
    # Here some logic to fetch the required data
  end
end

Now we only need to build a good cache key for our @query. As always, we need to keep in mind that having too many files in a directory could be a performance killer, so one good approach would be to take any hash function of the search key and use a first few characters as a filesystem hierarchy for it:

  def cache_key
    key = Digest::MD5.hexdigest(@query)
    "#{key[0..1]}/#{key[2..3]}/#{key[4..-1]}"
  end  

Then the resulting class can be used as shown below:

data = ExternalSearch.new('some keywords').results

All subsequent queries for 'some keywords' will be served from cache within 30 days of this first call.

Sunday, February 1, 2009

Installing Sphinx on OS X Leopard

Sphinx can be a pain to install on OS X if you need the bleeding edge version. At the time MacPorts has only 0.9.8 (latest stable) in repository, and 0.9.9-rc1 needs to be built from the source. Unfortunately it requires iconv in order to be built correctly and doesn't see neither system iconv nor libiconv from ports. Clinton R. Nixon from Vidget Labs offers his own solution involving source libiconv installation. It works very well but the problem can also be solved by configuring Sphinx with additional parameters.

So, if you see something like that in make output:

Undefined symbols:
  "_iconv_close", referenced from:
      xmlUnknownEncoding(void*, char const*, XML_Encoding*)in libsphinx.a(sphinx.o)
  "_iconv", referenced from:
      xmlUnknownEncoding(void*, char const*, XML_Encoding*)in libsphinx.a(sphinx.o)
  "_iconv_open", referenced from:
      xmlUnknownEncoding(void*, char const*, XML_Encoding*)in libsphinx.a(sphinx.o)

Then use the following configure parameters (you can change prefix, of course):

CPPFLAGS="$CPPFLAGS -I/opt/local/include" \
LIBS="$LIBS -L/opt/local/lib" \
./configure --with-mysql=/opt/local/lib/mysql5/ \
--prefix=/opt/sphinx-0.9.9-rc1

After that, make and make install as usual.

P.S. I didn't try it with system iconv, only with libiconv from ports, as I already had it installed.

Saturday, December 6, 2008

do { } while(0)

I was reading about hash tables organization this morning and checked how Ruby implements its internal symbol table. Investigating Ruby 1.9 source code I found st.c — general purpose hash table package by Peter Moore. The code is pretty clear, all standard hash table operations, some Ruby-specific preprocessor directives. And a couple of interesting things I didn't encounter before.

Some hash table operations in st.c, like finding and adding the entry, are defined as macros, not functions (probably to avoid function call overhead):

do {\
    st_table_entry *entry, *head;\
    if (table->num_entries/(table->num_bins) > ST_DEFAULT_MAX_DENSITY) {\
 rehash(table);\
        bin_pos = hash_val % table->num_bins;\
    }\
    \
    entry = alloc(st_table_entry);\
    \
    entry->hash = hash_val;\
    entry->key = key;\
    entry->record = value;\
    entry->next = table->bins[bin_pos];\
    if ((head = table->head) != 0) {\
 entry->fore = head;\
 (entry->back = head->back)->fore = entry;\
 head->back = entry;\
    }\
    else {\
 table->head = entry->fore = entry->back = entry;\
    }\
    table->bins[bin_pos] = entry;\
    table->num_entries++;\
} while (0)

So far so good, hash value modulo number of bins gives us fine distribution over bins and each bin is organized as a linked list. But what is that do { ... } while(0); thing? Looks strange from the first sight: the code will be executed only once, why not just disclose it in { .. } block or leave as is?

Fortunately I found very good explanation of this preprocessor trick in this FAQ entry. When this macro is substituted literally by preprocessor, it behaves like a code atom:

if (something)
  do {
     ...
  } while(0); /* Simple {...} block would conflict with the following else here */
else
  do_something();

do {...} while(0); construction not only creates a scope for local variables but also can be used everywhere as a single statement, without explicit { ... } around it. Nice technique!

Monday, November 17, 2008

Small surprise from Git 1.6

After installing Git 1.6.0.4 on my Leopard box I noticed that lots of git-* executables are finally gone in a favor of the single git binary. I like it, as I never liked qmail style of having a separate executable for every operation. By the way, if you're on Mac don't forget turning on bash_completion variant for git-core:

sudo port install git-core +doc +bash_completion

Also set up aliases (git st really saves your fingers), colors and personal data:

git config --global alias.st status
git config --global alias.ci commit
git config --global alias.co checkout
git config --global alias.br branch
git config --global color.ui auto

git config --global user.name "Your name"
git config --global user.email your-email@your-domain.com

Finally, some .bash_profile goodness to track branch in bash prompt (and add colors):

PS1='\[\033[01;32m\]\h\[\033[01;34m\] \w\[\033[31m\]$(__git_ps1 "(%s)") \[\033[01;34m\]$\[\033[00m\] '

Wednesday, November 12, 2008

Rails and Amazon EC2

While we were planning the launch of CookEatShare, a question of the good Rails hosting was heavily discussed. We've been considering MediaTemple, Amazon EC2 and a couple of other solutions (hard to remember right now, potentially Slicehost). Finally we chose EC2.

So after I have a lot of experience deploying to EC2, let's try to answer the question: is Amazon EC2 good for Rails applications?

Ok, first let's see what EC2 is. The official definition is "a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers." But what does that mean to the application maintainer?

Generally, it's an AWS (Amazon Web Services) account with a couple of associated keys that allow you launch and shut down the instances. You can treat an instance as a physical box with given characteristics with one nuance: if your instance fails (or you terminate it manually) all your data is lost. That is why EC2 is positioned and used more like the platform for streamlined tasks such as video processing or distributed calculations. It may not sound appropriate for the web application, but let's turn to the bright side.

As an advantage, that non-persistence teaches the discipline. When you know that your data will be lost if (I would even say 'when', not 'if') something goes wrong, you'll definitely have backups. And you'll pay more attention to a deployment scheme, especially to ease adding instances later (We didn't scale to the second instance yet but can't wait to try!). And it's fun to have the Capistrano task that builds nginx from source and compiles fair proxy balancer in!

To guard the sensitive data (database, user-uploaded content in case you don't store it on S3), Amazon EBS service can be useful. It's just a block device that can be attached to the instance and mounted as a usual partition. It's persistent and pretty robust, but since we started to use EBS our MySQL server started to behave strangely from time to time. Whether this is Amazon problem or some bug in our configuration, I still don't know. Anyway, I keep trying to fix it.

Instances are pretty stable: previous one worked for almost a year with one hangup that was solved with rebooting) until I manually terminated it after moving to a new one (due deployment change and distribution upgrade).

Disadvantages are not very numerous so far: email rejecting problem (see below), strange EBS behavior with MySQL (again, the real source is still unknown), more effort required to organize the deployment, not all Linux distributions are officially supported, that's all I can remember right now.

So, EC2 can be decent platform for the Rails application deployment. Yet I wouldn't recommend it unless you back it up with the application restore scheme that will allow to operate quickly in case of the instance failure.

Useful facts:

  • if you reboot an instance, the data won't be lost, only the termination leads to the data loss;
  • keeping configuration files (like app server settings, mysql configs, mail server settings) in the application itself is a very good idea. Especially if you need to launch another instance quickly;
  • sending out email from EC2 can be tricky: most servers will reject your email because of blacklisted IPs. This excellent article really helped me set up our mail infrastructure;
  • it's better to start with Elastic IP to make moving from instance to instance seamless, otherwise you will be assigned a dynamic IP that cannot be transferred to another instance.

EC2 solutions:

  • Ubuntu images for EC2 — probably the best Ubuntu AMIs available;
  • EC2 on Rails — very good deployment solution, my own deployment plugin was heavily inspired by this work;
  • Rubber, another decent alternative, it was tricky to follow and I don't need multi-instance setup at the moment, yet it's very interesting;
  • Deployer — my own small solution extracted from CookEatShare. It's not general and I doubt that it works out of the box on the arbitrary instance but I am going to put some care into this product in the future, please fork it if you want to use it and make improvements.

Friday, September 19, 2008

Turning off email delivery for test users

For some reasons I happen to like a mail server configuration. Maybe it's just with Postfix, I didn't try anything else for ages. This is why the following task seemed interesting to me.

Let's say we have two sites: first a beta with latest features under testing and second is the production one. We don't want our e-mails to be sent on the beta except for some users who test the site. But we can't just remove other users because the good infrastructure is critical for the beta site (and it's fun to test with a recent copy of the production data). One good approach is to change the users' emails to point somewhere like username@example.com thus guaranteeing that those emails won't be delivered to the real person. But we can actually go further by telling Postfix to discard all messages for anyone@example.com without even trying to deliver them.

It's easy to do with header checks. So, here's a little snippet:


In /etc/postfix/main.cf:
header_checks = pcre:/etc/postfix/header_checks

In /etc/postfix/header_checks:
/^To:.*@example.com/ DISCARD