You’ve been once told to separate every piece of your application into services. A service represents some interaction on the data – like adding a product into a basket, generating an invoice or preparing files for download, you get the idea…
As your app grows you are creating more and more services, still anyone should be able to understand what the app is doing just by looking at service’s filenames (think
Services are solution to logical code separation, doing so simplifies working in the team and streamlines maintenance. You can thoroughly test a service without much of an effort.
Services often are abstractions for critical parts of an application – actions that are essential to an user, that are performed many times a second, actions that should be quick. Therefore, you might start thinking about optimizing these elemental services. Assuming you have written marvelous specs for the service, you can tweak the service’s implementation or even rewrite it completely (like in SQL, PL/pgSQL to be precise).
These are my thoughts on problem of efficient Rails services, this post is not a real tutorial, but a set of some guidelines and recommendations based on my experiences. Even though I’m talking Rails and PostgreSQL, presented concepts can be expressed in any language and in most databases1.
For starters, you should be looking at development logs. When an action takes 100ms or more to perform, that’s when you want to start optimizing things. I’m simplifying here – you should try to write efficient code since the beginning and always target into shortest processing times. Shorter processing times allow handling more request per second, resulting in smaller number of workers you need (that is processes or new machines) – you can postpone scaling resources, because resources you’ve got already might be enough.
So you are watching the logs, fighting N+1 queries with
.pluck() instead of retrieving whole model, adding more restrictive
.where() clauses or doing some minor code hacks. Still most of the work is done on Ruby side.
Do not let yourself believe in Rails magic – there is no magic – it is only some code.
To get something from the database a query must be prepared – built from ActiveRecord or Arel pieces, then it is sent to database server, server does some processing and retrieval, results are returned back to the Ruby side. Ruby builds array or hash or model out of the results – it must allocate memory, create objects, maybe run garbage collection in most inadequate moment… And everything Ruby does is slow.
As much I love Ruby, I must admit Ruby is slow. It is fast enough for most of its applications – after all you do not write high performance games nor do scientific HPC stuff in Ruby. Nonetheless Ruby got its popularity through Rails, which can process thousands request per second, even though Ruby is slow. Rails applications are often that quick because of caching (lets postpone this talk for another time). However some of the actions cannot be cached as they are volatile – they expire the caches as they modify or introduce new data – that is where slowness of Ruby on Rails stack can be observed.
You have minimized received and processed data on the Ruby side where your business logic lies. You did this probably by using many complex queries to the database. Is it enough? Are you sure always get sub-1ms pings to the database?
Let me introduce briefly my commercial work-in-progress project – PhotoSeller is online application for… selling photographs. It is specialized for distribution of pictures to the participants of massive sport events (like triathlons, marathons or other sporting events). Even though it is flexible and quite complex inside, it is very simple for the end-user. As you can imagine, critical parts are finding an image, adding the image to basket and checking out. This brief description should be enough for the case study abstract.
Oh wait, adding an image to the basket is not as simple operation as you might have thought. An image may contain some participants (zero or more) and it is a part of an album. The album is associated with many different pricelists (like private usage, commercial usage, prints) – a pricelist contains gradation of prices for given image count depending who is tagged on the image. Actually it is bit more complicated.
In order to make things work I have to group positions in basket by
selected pricelist and intersecting competitors on the images. In order to do
BasketServices::AddImage was created. Tests were written and after some time following, quite handsome, code was created:
You don’t need to really understand what the code does, but it matters how it does things. Adding an image to the basket is super critical operation in context of the application. You can cache searching images, but you cannot cache volatile basket operations.
Presented iteration of code was somehow optimized – few hours of testing and development were required to produce such code. Even after that it takes 6 to 11 queries to perform and results in 30-60ms of wasted time.
It might not sound like a particularly slow code, but let me put this in other terms – only 15-30 users per second can add an image to their baskets. It matters when you want to handle thousands requests per second on a single server. Obviously not every user wants to add image, some are doing other actions, but those who are adding images are blocking others (adding more workers is not a real solution for a slow code).
After some thinking, I have decided further optimization on Ruby side is not worth it – Ruby is still Ruby, so as Rails is still Rails – some things cannot be made quicker. So how to provide more requests per second, without lame horizontal scaling?
SQL is the answer. Finally, after such long introduction.
Everyone writing any data-oriented application must have used SQL in some form. Even though you may like your ORM, sometimes you need to do complex joins or groupings by yourself, because ORM is just a piece of code with no capability of complex human brain.
I was a SQL-skeptic.
I know basics of SQL for 10 years now (you know PHP times and stuff), later on I had some database courses at my university (it was all about Oracle so, hmm, did not care much). Anyway I never been keen on SQL – I’ve found it rather obscure, old fashioned and unnecessary, but I knew that relative databases are not a dump storage, but sophisticated systems. Surprisingly with sophistication comes efficiency and speed. Databases are made to be quick with data – internal representation of the data is optimized heavily and the SQL language was designed with efficiency and expressiveness in mind.
My idea for optimizing
BasketServices::AddImage was rewriting it, so it
could be run on the database server instead inside Rails. I have created
basket_add_image in PL/pgSQL language. PL/pgSQL is somewhat merge of declarative nature of SQL and imperative nature of more common
languages – if you know some SQL you will find PL/pgSQL easy enough.
After some hours (it took considerably longer than Ruby version) following function has been written (just scan it, leave reading thoroughly for some boring day):
Ruby version of service can be reimplemented as thin layer between Rails and the database:
Even though code is much longer now, it is still quite readable2. Was it worth it?
To be honest development of this code was not a bad experience – as you have written already specs for Ruby version, just write the SQL code until it has passed the tests.
It takes now 0.5-2ms to add image to basket and you need to address only a single SQL instruction from Rails! You can make a thousand request per second with this code3 – over 3000% improvement over Ruby version – woohoo!
I’m far from being SQL expert – I’m sure someone could further optimize my SQL code, therefore it could be speeded up even more. Ergo, learn some SQL!
Still not convinced? I know it takes some time to move out of Rails
world into SQL world. Looks complicated? It is not – every imperative
language (think Ruby) is similar, so you know already imperative part of
PL/pgSQL and you know some SQL – you can always help yourself with SQL
generated from Rails – just look through logs or
rails console output.
Right now the problem with SQL code is that you have to load it into database manually. You could move it into ActiveRecord migration, but it is an overkill – you need to create new migration for every change in the code.
I’ve found squirm_rails gem – it loads
migrations. I’ve been using it for some time, but if you think about it,
assuming you’re rewritten multiple services, your
would contain quite a lot of unrelated code.
In the introduction I declared that just running command
should tell what the app is all about. Let’s store SQL implementations
app/services – for every function let’s create a SQL file.
Some might think that storing SQL code in
app/ is wrong, but I think
everything what makes a core of the app should lie in there.
config/initializers/sql_services.rb initializer to load
SQL sources into database when you start your app:
Now you can work with your service’s SQL code as easily as with Ruby code.
The script is not perfect – it loads the functions into database multiple times – for each Rails initialization, even if SQL has not changed. The drawback of this is minor slowdown when starting multiprocess server – that is a room for improvement.
You know now how to speed up core services of your application by reimplementing them in PL/pgSQL, yet still maintaining clean organization and separation of the code.
Thank you for reading – I’ll be very happy if you find this post useful anyhow. I just wanted to share some of my experiences with you, as I think many others are making the same mistakes as I did. Feel free to share and comment.