Sohan's Blog

Living the Developer's Life

Will You Put That Cell Phone Away?

I know it’s your phone and you can do anything you want. But may be it’s worth holding back at times. There’s more to life than our screens.

I’m leaving it off the hand, off the table and off the bed, in silent mode, all day long. I’m asking for permission to use it if I have to while I’m with others. Instead of spending the time with the screens, it’s been very rewarding to play with my little son, with all the attention on him. My little guy sure senses the attention and you can tell it just from looking at his eyes.

Algorithms Need Better UI

Since 2006, as I became a professional software engineer, I spent a fair amount of time reading books and online articles about software. Most topics were around Object Orientation, Web technologies, Software design, Testing, Automation, Agile and Lean, etc. Last year, I decided to take a break from such topics on purpose. So, I cleared all my RSS subscriptions.

Instead, I thought I would revisit some of the algorithms that I learned during my undergraduate courses. It’s been a few years since I studied algorithms and I thought I was more seasoned to appreciate and solve some of the algorithm problems than in the past. So, I started with the dynamic programming problems such as: Longest Common Subsequence (LCS) and Knapsack Problem and found this solution in Wikipedia:

Source code for finding the lenght of the Longest Common Subsequence
1
2
3
4
5
6
7
8
9
10
11
12
13
function LCSLength(X[1..m], Y[1..n])
    C = array(0..m, 0..n)
    for i := 0..m
       C[i,0] = 0
    for j := 0..n
       C[0,j] = 0
    for i := 1..m
        for j := 1..n
            if X[i] = Y[j]
                C[i,j] := C[i-1,j-1] + 1
            else
                C[i,j] := max(C[i,j-1], C[i-1,j])
    return C[m,n]

As a software developer, when I’m reading an article, if there is an accompanied source code, my eyes automatically scroll into it skipping any textual blurb. It was the same in this case, and I must say, I noticed a few things here:

  1. The single literal variable names need some love.
  2. The code can use some logical grouping and naming to easily communicate what’s happening here.

I found this to be an UI problem. The algorithm itself is quite complicated for an average person to understand. However, we can probably reduce some noise with better naming/grouping. Here’s an alternate version of the same code.

Modified source code, with descriptive names
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
function LCSLength(sequence1[1..sequence1Size], sequence2[1..sequence2Size])

    table = GetTableWithZerosInFirstRowAndColumn(sequence1Size, sequence2Size)

    for sequence1Index := 1..sequence1Size
        for sequence2Index := 1..sequence2Size

            if sequence1[sequence1Index] = sequence2[sequence2Index]
              IncrementLength(table, sequence1Index, sequence2Index)
            else
              UseCurrentLength(table, sequence1Index, sequence2Index)

    return table[sequence1Size, sequence2Size]

function GetTableWithZerosInFirstRowAndColumn(columns, rows)
  table = array(0..column, 0..rows)

  InitializeFirstRowWithZeros(table)
  InitializeFirstColumnWithZeros(table)

  return table

function InitializeFirstRowWithZeros(table[columns x rows])
  for columnIndex := 0..columns
       table[columnIndex, 0] = 0

function InitializeFirstColumnWithZeros(table[columns x rows])
  for rowIndex := 1..rows
       table[rowIndex, 0] = 0

function IncrementLength(table, columnIndex, rowIndex)
    table[columnIndex,rowIndex] := table[columnIndex-1,rowIndex-1] + 1

function UseCurrentLength(table, columnIndex, rowIndex)
    leftCell = table[columnIndex-1,rowIndex]
    topCell = table[columnIndex,rowIndex-1]
    table[columnIndex,rowIndex] := max(leftCell, topCell)

I know this modified code is verbose, but I find it self explanatory. Using descriptive names are not a new concept in software engineering at all. I wish we had our algorithm books with annotated source code like this, where it’s readable by humans.

Let’s use uglifiers and minifiers to do the machinification for us.

AngularJS Is Very Productive, and Cool Too!

It has a very steep learning curve, but yields a superb productivity boost once you’ve learned it. Check out my demo of the wizard that we’ll discuss next.

AngularJS works by extending HTML to produce declarative UI code and eliminating the need for a lot of boilerplate code. For example, the mental model of a wizard can be expressed using the following HTML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<wizard title="Flight Search">

  <step title="Search">

  </step>


  <step title="Select a flight">

  </step>


  <step title="Select a return flight">

  </step>


  <step title="Checkout">

  </step>

  <step title="Confirm purchase">

  </step>

  <step title="Receipt">

  </step>

</wizard>

With AngularJS, one can write exactly this markup with the help of two custom directives, widget and step. This declarative UI code makes it very easy to read. In addition to this, the two way data binding capabilities of AngularJS makes it very productive as we don’t need to write a bunch of references to the DOM nodes and render the nodes as the data changes. For a working example, check the source code of the demo and if you are like me, you’ll love to see how simple it is.

Released Streamy_csv Gem

Following the previous post, I decided to spin off a little ruby gem for you folks. Get streamy_csv, and write only your application code while it’ll do the boilerplate work for you.

In a nutshell, with this gem in your application, all you need to do is this:

1
2
3
4
5
6
7
8
9
10
11
12
Class ExportsController

  def index

    stream_csv('data.csv', MyModel.header_row) do |rows|
      MyModel.find_each do |my_model|
        rows << my_model.to_csv_row
      end
    end

  end
end

Find more at https://github.com/smsohan/streamy_csv

Generating and Streaming Potentially Large CSV Files Using Ruby on Rails

Most applications I’ve worked on at some point required that ‘Export’ feature so people would be able to play with the data using the familiar Excel interface. I’m sharing some code here from a recent work that did the following:

Generate a CSV file for download with up to 100,000 rows in it. Since the contents of the file depends on some dynamic parameters, and the underlying data is changing all the time, the file must be generated live. Generating a large file takes time and the load balancer will drop the connection if it takes more than 1 minute. In fact, as a consumer I myself would be frustrated had it took even 1 minute to see something happening. This problem natually requires a streaming solution.

For a familiar example, let’s say we are downloading a CSV file containing transactions on an online store for the accounting folks. Lets say the URL is as follows:

http://transactions.com/transactions.csv?start=2013-01-01&end=2013-04-30&type=CreditCard&min_amount=400

So, this would download a file containing the transactions from January to April of 2013, where a CreditCard was used for a purchase over $400. Here goes the code example with inline comments describing interesting parts.

app/models/transaction.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Transaction
  belongs_to :store
  attr_accessible :time, :amount

  def self.csv_header
    #Using ruby's built-in CSV::Row class
    #true - means its a header
    CSV::Row.new([:time, :store, :amount], ['Time', 'Store', 'Amount'], true)
  end

  def to_csv_row
    CSV::Row.new(title: title, store: store.name, amount: amount)
  end

  def self.find_in_batches(filters, batch_size, &block)
    #find_each will batch the results instead of getting all in one go
    where(filters).find_each(batch_size: batch_size) do |transaction|
      yield transaction
    end
  end

end

Given this Transaction model, the controller can call the methods and set appropriate http headers to stream the rows as they are generated instead of waiting for the whole file to be generated. Here’s the example controller code:

app/controllers/transactions_controller.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class TransactionsController

  def index

    respond_to do |format|

      format.csv render_csv

    end

  end

  private

  def render_csv
    set_file_headers
    set_streaming_headers

    response.status = 200

    #setting the body to an enumerator, rails will iterate this enumerator
    self.response_body = csv_lines(filters)
  end


  def set_file_headers
    file_name = "transactions.csv"
    headers["Content-Type"] = "text/csv"
    headers["Content-disposition"] = "attachment; filename=\"#{file_name}\""
  end


  def set_streaming_headers
    #nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
    headers['X-Accel-Buffering'] = 'no'

    headers["Cache-Control"] ||= "no-cache"
    headers.delete("Content-Length")
  end

  def csv_lines

    Enumerator.new do |y|
      y << Transaction.csv_header.to_s

      #ideally you'd validate the params, skipping here for brevity
      Transaction.find_in_batches(params){ |transaction| y << transaction.to_csv_row.to_s }
    end

  end

end

As you see in this example, it’s pretty straight forward once you put the pieces together. These streaming headers work under most servers including Passenger, Unicorn, etc. but webrick doesn’t support streaming responses. It took me some time to figure out the headers and the enumerator thing, but since then it’s working beautifully for us. Hope it will help someone with a similar need.

Simplicity and Client-Side MVC

After spending about 6 months on this new project using BackboneJS, and spending some hours learning AngularJS and EmberJS, my realization at this point is:

Use Client-Side MVC very Selectively.

Sometimes on a single page of your app, you need to offer a lot of interactions, each scoped to a small part of the page only. In such cases Client-Side MVC offers some neat features. I’ll try to share my perspective with some concrete examples where I’d say yes/no to Client-Side MVC.

  1. Build a Calendar page - Yes.
  2. Build a Master/Detail view - No.
  3. Build a Credit Card Payment Form - No.
  4. Build a Story Wall like Trello - Yes.
  5. Build an Airport Departures/Arrivals display - No.
  6. Build a Search form - No.

As you see here, I suggest using it only when a lot of Client-Side interactions can happen, with little server side data requests.

MicroOptimization Trap

Yesterday, couple of my friends and I were discussing about a CoffeeScript topic. In short, given the following code in CoffeeScript:

1
2
3
class Cat
  type: -> 'cat'
  meow: -> 'meow meow mee..ow'

The compiler produces this in JavaScript:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var Cat;

Cat = (function() {

  function Cat() {}

  Cat.prototype.type = function() {
    return 'cat';
  };

  Cat.prototype.meow = function() {
    return 'meow meow mee..ow';
  };

  return Cat;

})();

As you see here, the prototype method definitions use the prefix Cat.prototype repeatedly. Which could be compressed using closure as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var Cat;

Cat = (function() {

  function Cat() {}

  var p = Cat.prototype;

  p.type = function() {
    return 'cat';
  };

  p.meow = function() {
    return 'meow meow mee..ow';
  };

  return Cat;

})();

This closure form is for sure less bloated than the one produced by CoffeeScript. And on IE7, each Cat.prototype lookup takes about a microsecond, so, the closure form quite literally yields a micro-second-optimization! When one zooms into such micro-optimizations, other bigger fishes are easily missed.

For example, even though the output is a little bloated from the CoffeeScript compiler, the CS source code is rediculously simpler/shorter even for this tiny example. But that’s not the only reason why one would use CS. If for nothing else, CS produces lint JavaScript for free. I’m not saying one has to use CoffeeScript, but this serves as an example scenario. When making a decision, a tunnel-vision into the micro-optimization may as well be a trap.

Unconventional Conventions

JavaScript has been reborn, thanks to jQuery to begin this epic and wonderful march. These days, we have a huge influx of micro to macro frameworks, all providing some well curated set of features. Every now and then I try to get a glimpse of the new and hot stuff so when time comes, I can make an informed decision on what to use and why.

This post is about my recent attempt at learning AngularJS and EmberJS. Client-side MVC and/or single page apps need some core features as follows:

  1. URL routing
  2. Rendering views
  3. Read/write data from/to the backend server

Both AngularJS and EmberJS offer these and some additional features, most notably, two-way data binding, so a change in the JavaScript object reflects on the rendered view and vice versa. But, to be honest, I think they overstretched the framework N miles to the north of where it should be. Let me explain with an example:

Conventions are good when they offer an easy mental model. For example using the following router mapping:

1
2
3
App.Router.map(function() {
  this.route('favorites');
});

it would match favorites with a App.FavoritesRoute based on the names. It’s an easy enough pattern to recognize and remember. However, I find it’s too far fetched in the following example:

1
2
3
4
5
6
App.FavoritesRoute = Ember.Route.extend({
  model: function() {
    // the model is an Array of all of the posts
    return App.Post.find();
  }
});

My mental model is challenged in a few ways here:

  1. Route has a model hook - which, by conventional usage, I can’t seem to recognize as a pattern.
  2. App.Post.find() entails an Ember.ArrayController, one provided by the framework. Here, a router is producing a Controller in disguise from the model hook.

I’m sure with repeated usage, I’d be able to use these conventions without much of a problem. But on a pure API design/Architecture perspective, I think it’d make more sense to remove such unconventional conventions.

To be honest, I’ve attempted the EmberJS guides for the 3rd time now, in less than a month, and I feel lost in so many conventions. I have had similar experience with learning AngularJS as well.

With AngularJS, I was first a bit skeptical about all the custom tags that you’d introduce when using AngularJS. But after giving it a go, it sort of made sense as a trade-off between less code and manageable clutters. But I got really uncomfortable with the likes of the following:

1
2
3
4
5
<ul class="phones">
  <li ng-repeat="phone in phones | filter:query">

  </li>
</ul>

While it offers a snappy filtering experience and some declarative code, it feels too far stretched again. The convention around $scope, and a few other oddities as you’ll see in this EggHead.io $scope vs scope tutorial can be quite a stress on the brain.

To conclude, I personally like both frameworks and think they offer some great features over their counterparts, but it’d make sense in a later version to pick conventions where it’s truly conventional over an imposed one with lots of foreign concepts.

Call Me Sohan and Ask-Me-Not About S M

I have heard this question n times, where n tends to the number of days since June 2006, as I started working with the people from North America:

Should I call you S M?

Well, I’m from Bangladesh and even today, we don’t really have a first and last name, instead what we have is, a full name and a nick name. My full name is, S M Sohan, and my nick name is Sohan. So, call me Sohan.

I don’t have a first or last name like many people in Bangladesh. But since almost everywhere, from the online forms to the call center agents, they ask me about these info, I tell them, my first name is, S M, and last name is, Sohan. At times I get funny reactions to this, for example:

S M? Just S M?

Better yet, some offer me spelling suggestions:

You can spell it ESEM, because SM is hard to pronounce!

You people are funny! Well, S M is actually a short form for Sheikh Mohammed. But it’s so common in Bangladesh that, almost nobody uses the full version in their official documents. As a bonus, you save all those inks and bytes required to spell these extra characters. Honestly, of all these years, this is probably just the 4th time I’ve written the full form, the other 3 times were for US visa applications. Also, it’d be equally weird to me, had I used Sheikh as my first name and people called me by that!

This first/last name convention also causes funny consequences to my friends. For example, two of my friends have names differing only in their middle names and they were rommies at a shared residence in Calgary. Guess what, their credit reports got all confused about who’s who and mixed up the scores!

Lot of my muslim friends go by “Md Xxxx Yyyy”, where Md is a short form for Mohammed. People that aren’t too familiar with it, at times think of them as doctors, which can be quite funny, and dangerous!

If you aren’t convinced yet, take these ones for example. AAMS Arefin Siddique current VC of the University of Dhaka, A. K. Fazlul Huq a renowned leader in the history, A P J Abdul Kalam an ex-president and a nuclear scientist from India.

I hope this makes you comfortable in calling me by the name Sohan :)

Implementation Challenges With a Multi-Tenant/SaaS Database

climbing

This 2006 MSDN article points out some key aspects of designing a multi-tenant database for SaaS applications. As you can read in the article, SaaS databases need to pick one of the following three configurations:

  1. separate databases
  2. shared database, separate schema and
  3. shared database, shared schema.

A number of factors including economic, security, skillset, etc. contribute to the selection of the best suitable configuration. In this post, from my experience, I’m sharing the following practical requirements that introduce additional implementation challenges:

  1. Each account needs to have a maximum allowed space on the database (economic).
  2. Data from one account should never be accessible to other accounts (security).
  3. However, for backend usage, we need the ability to run queries across all accounts.

Size limiting is quite hard. It almost forces the use of separate database/schema per account. Even then, most databases of today don’t have a clean mechanism to exert such a hard limit.

Separate databases reduce the chance of cross-account data leaks. But backend tasks suffer for this. For example, your monthly billing processor needs to generate bill for all accounts. With one database/account, it cannot do one simple query to a single database anymore.

Also, most ORM libraries don’t support separate databases for a single type. For example, to fetch the orders from the database, the ORM library needs to connect to database A for account X, but to database B for account Y and so on. At this point, if possible, you’ll need to tweak the ORMs a lot or fall back to your own ORM, which as I wrote in the past, is almost never a good idea.

Connection pooling is another challenge. It’s generally a good practice to use connection pooling, to save the overhead of establishing a connection before every query. With separate databases, and hundreds, if not thousands, of accounts being served from an app server, the connection pool would either have too many or too few connections in it to be useful.

I don’t know about a clean architecture that’d address these requirements while not introducing the dev challenges. Please comment if you’ve any suggestion.