I came across this interesting flow chart over at the High Scalability blog.

The chart is quite amusing in a mildly NSFW manner, but what struck us is the context in which it is being used–it is being used in a blog post that discusses whether Twitter’s decision to stick to their current MySQL based storage system for Tweets instead of migrating to Cassandra as their data store is the right decision.
The author of the post, Ian Betteridge, believes it is the right decision–and he uses the above flowchart as one of two props supporting his belief. The other prop Mr Betteridge uses is an old Joel Spolsky blog post which suggests that no company should ever rewrite any of their software from scratch. Now, I am sure that Mr Spolsky is much smarter than I on matters of software, but our experience at Cleartrip is that it’s best to never say “Never.”
At Cleartrip, we have pretty much rewritten from scratch almost all of our core systems–search, transaction processing, transaction management. At some point or other, for some reason or other, we have had no choice but to rewrite these systems from scratch. The reasons for rewriting our systems have varied; sometimes it was for increased reliability and stability and other times it was because we needed a change in architecture to support our growing needs as a business.
We didn’t rewrite code from scratch because we wanted to, we did it because we had to; because it was the sensible thing to do. These weren’t easy decisions to make and there were all sorts of reactions within the team to these decisions–ranging from anguish, resignation, rage and tears to relief.
We knew that investing resources in major rewrites was going to hold us back, but every time we did it, we did because it was the right thing to do for the business. So, while we don’t disagree with most of Mr Spolsky’s position, we will respectfully disagree with any prescription that holds itself as absolutely and perfectly correct regardless of circumstance. Such prescriptions are for political and religious zealots, not for businesses.
Aside
While Mr Spolsky has made many sensible statements on his blog, in this particular case we also disagree with his theory of what killed Netscape. Mr Spolsky believes that Netscape died because they made the “single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch.” Anyone that thinks Netscape died because they rewrote their code from scratch, either:
a) Lives underneath a rock and never heard of a juggernaut called Microsoft that gave away their browser for free, wiping Netscape off the map
b) Is so many sandwiches short of a picnic, that they should immediately cease to be taken seriously in any context
Microsoft gave Internet Explorer away for free, cutting off Netscape’s air supply. Netscape’s revenue stream disappeared–that’s why Netscape died. To suggest otherwise is ludicrous.
Joel does have a point.
Netscape 4.0 was painfully slow. It took more than a minute to start. Netscape didn’t address the issue immediately. They waited for the rewrite to finish which took long time. Meanwhile, Microsoft released IE 4 that was far superior to Netscape and then even better IE 5. While, there might be truth in the statement that Microsoft cut off Netscape’s revenues by giving browser for free, there is no denying that IE 4/5 grew because it was a better product.
As someone, who was at Netscape when the whole drama unfolded, told me, "IE 4 was the beginning of end for Netscape."
When Joel advocates against complete rewrite, he means rewrite of the entire application (not just a sub-system) from scratch. Right comparison would be Cleartrip rewriting transaction processing, payment, search, UI and switching database software in one single release. That’s what Netscape tried to do.
Shashi–that’s a fair point.
I am glad that you said the following:
"We didn’t rewrite code from scratch because we wanted to, we did it because we had to; because it was the sensible thing to do."
I would like you to elaborate a little on that. What sort of homework did you guys put in? Did you write down all the pain points, bottlenecks, short-comings, inflexibilities with the current system? Did you guys have to prepare a complete case for your rewrite? What were the steps that you followed towards making that rational decision? I am very sure that there were various stakeholders involved (your mentioning of lot of emotions) who needed to be convinced or consoled.
My personal philosophy is that if you are not aware of the problems in the current system, you will most likely not solve them in the new system.
Ram–great questions, we’ll try and do a follow-up post to address some of the things you’ve raised.
Why do you need to justify it?
You guys rewrote it because oldies in your engineering team were wary of Lisp and not ready/capable to learn it.Truth is that Cleartrip used to run on 20 servers before and now it runs on 50 when the traffic has NOT grown three times.
Some rewrite!
@Chaitanya–just sharing our perspective, something we’ve always done on this blog.
@piyush–traffic and transactions have more than doubled, see for yourself:
http://trends.google.com/websites?q=cleartrip.com&sa=N
And, in case you haven’t come across it, Joel also recommended against building anything in Lisp:
"I know that typically on new projects… some crazy person actually wasting quite a lot of time evaluating Squeak and Lisp and OCaml and lots of other languages which are totally, truly brilliant programming languages worthy of great praise, but just don’t have the gigantic ecosystem you need around them if you want to develop web software. These debates are enormously fun and a total and utter waste of time, because the bottom line is that there are three and a half platforms (C#, Java, PHP, and a half Python) that are all equally likely to make you successful, an infinity of platforms where you’re pretty much guaranteed to fail spectacularly when it’s too late to change anything (Lisp, ISAPI DLLs written in C, Perl), and a handful of platforms where The Jury Is Not In, So Why Take The Risk When Your Job Is On The Line?
Oh and I know Paul told you that he made his app in Lisp and then he made millions of dollars because he made his app in Lisp, but honestly only two people ever believed him and, a complete rewrite later, they won’t make that mistake again."
http://www.joelonsoftware.com/items/2006/09/01.html
As I said above, Mr. Spolsky is much smarter than I in matters of software.
So, the Spolsky article that you have quoted, lets look at when it was from — September 2006. If my memory serves me correctly, you start using a Lisp-based air search not much later than when this article was published.
How many transactions were you doing then? Fast forward to early 2010, when you shut down your Lisp based system for good, and how were you doing then?
At what time during this phase, did this Lisp based air product "fail spectacularly", even after you had driven most of your Lisp developers away?
Making blanket statements to the affect of "won’t make that mistake again", without really explaining why, doesn’t suit this blog, where the rationale behind even minor design changes is discussed in detail. This will no doubt make the voices of the anti-Lisp crowed even louder, without adding anything of real value to the discussion. So, in the interest of the public, do you mind telling us the exact reasons behind your move away from Lisp?
And, by the way, did you read "Coders at Work"? The interview with Jamie Zawinski (a former Netscape employee) might shed some more light on what the rewrite did to the company.
“.. ranging from anguish, resignation, rage and tears to relief.”
W00t! Tears!! This is something new.
“ .. Oh and I know Paul told you that he made his app in Lisp and then he made millions of dollars because he made his app in Lisp, but honestly only two people ever believed him and, a complete rewrite later, they won’t make that mistake again.”
Hmm.. ITA was sold for some 700 million dollars a few weeks ago. Have you heard the talk by Dan Weinreb at Google?
http://xach.livejournal.com/225634.html
Just curious. What libraries did you miss in Common Lisp ecosystem? Can you elaborate on the reasons for choosing Common Lisp in early bad years of bad performance? And yes, what languages did you consider beside Java when you were planning for a rewrite? By seeing your openness in sharing things, I’m really excited about the architecture of your current system. Would you mind giving some insight? BTW, how many systems are you interacting with these days?
Good work on rewrite… Cheers!!!
Traffic doubled and transactions doubled ? I doubt those figures.
Lets put that graph in a perspective: http://trends.google.com/websites?q=cleartrip.com,+yatra.com,+makemytrip.com&geo=IN&date=ytd&sort=0
All of them have grown equally and that means the growth would be more or less in line with industry growth. DGCA data tells me that market has grown about 15% last year. Anyway.
Moreover I do not see any improvement in air system. In fact if it continues being so slow for return trips to lesser searched airports. The only reason I use CT is because of UI and it has been the same ever since and has not improved due to this rewrite.
And Mr Spolsky’s wisdom shines through yet again:
"These debates are enormously fun and a total and utter waste of time"
There is no successful technology company on earth that is where it is because it made business decisions based on its love of the one true programming language (whichever one that may be).
That, in a nutshell, is what all the "my language is the best ninja jedi language on earth" types just don’t get.
No answers.
Team matters more than anything else. Personally I would be happy to code in Common Lisp/Clojure/Ruby/Python/Perl/Haskell/Erlang. And yes, programming languages do matter. Those who think otherwise either have never coded or just pretending (or may be just poor managers).
@Hrush..not an iota of understanding ‘Programming’ and again quoting blogs out of context.
Team matters, and your problem is you have dinosaurs in place to run this company, people with enterprise knowledge with all the experience to blame others when shit happens.
And your god, you Steve Jobs Fanboi uses an equally arcane Objective C to build apps for the Ipad/Iphone and other things you drool about.
Good luck in justifying your choices, and quoting blogs
First of all your comments are more interesting to read than you posts. Congrats i am not sure about your website but your blog sure does attract a lot of traffic!!!
I wont rant here about how good or bad technology is, cause every technology that supports any business is undoubtedly good. Businesses have thrived as technology has evolved. Unfortunately when it comes to bearing the brunt its always the technology and the technologists that suffer cos business still thrives – this would explain the anguish,rage, yada yada yada yada which i am sure still prevails with same intensity among a few as it floods your comments section.
May we know how the initial technology /platform / language was chosen? Was with a dire need of having something put in place quickly so that the revenue starts flowing or was it a very intelligent technical decision?
Considering amount of technology based awards won by Cleartrip, I would like you to re consider whether "Dont Fuck with it" was a better option now that it seems "You have fucked with it"
No business decision and technical decisions can go hand in hand but i doubt if there are technologists here (@Cleartrip) who would understand business implications (you may find a few of these) and business people who understand technology (thats a rarity).
When any software and apps run your business its easy to put a blame on them for all the wrongs, Fact would be that your numbers havent changed cause your rewrote anything, because even if your rewrote it you made it exactly like what it was before and nothing new. But you had to blame someone or something so LISP and LISPERS took the fall for you.
You may deny all the same but if you want to prove a point your rewrote has worked – lets see how. I would not accept your word for it,
I didn’t understand. Do you think Spolsky’s "wisdom shines through" and that he’s a "much smarter" software guy, or do you think he "lives under a rock" and that he "should immediately cease to be taken seriously in any context".
ANY context!
Or, more likely, are you just trying to make a point at any cost whatsoever, even if it means going back on what you said moments ago? We’ve seen you do the latter anyway, many times.
Ever considered that going back on your own words (strong, promising ones sometimes) may be a reason for all the "anguish, resignation, rage and tears"? You mustn’t have because <insert Spolsky quote here>, right?
I think it is a natural progression of any company to go from aggressive tiger to timid sheep in their attitude(they show aggression only when their existence is threatened). Of course this happens because they become too fearful to fail/fall.
Let me prove this stated hypothesis
Java bashing Sep 2007
http://blog.cleartrip.com/journal/2007/8/25/sunw-sets-java-rises.html
and lisp praising august 2007
http://blog.cleartrip.com/journal/2007/7/7/lisp-is-sin-and-all-data-is-code.html
and java bashing again Dec 2007
http://blog.cleartrip.com/journal/2007/12/20/be-brief.html
FF to July 2010
and we have this blog post. Yes this one. Cleartrip being rewritten in Java
QED
@piyush ROFL!! Good one.
If you think these debates are a complete and utter waste of time. Then you were correct in changing and rewriting your system. What will you a few more years down the line, repeat the same thing ??? AND IF YOU DONT ALREADY KNOW IT IS ONLY THE "my language is the best ninja jedi language on earth" types WHO EVEN BOTHER TO GET INTO THIS BLOG AND READ THIS SPACE, COS UNLIKE OTHERS THEY DO CARE ABOUT TECHNOLOGY AND ARE THE THE ONES WHO SHOULD REALLY BE JUDGES FOR ITS WORKABILITY AND SCALABILITY and not some half baked history grad and his fancy business associates. You cannot be a judge of how or bad technology is based on your balance sheets. Languages like LISP/Scala/ ERLANG/ Haskell require success stories of products (one of which was Cleartrip) that have worked so that they get popular and more and more people experiment with them. But your rework just made it obvious that old school technologies dont work but old school ideas do, BANK ON JAVA when everything fails. Is this another ply on way to becoming the Indianised version of Steve Jobs …..
Hrush: You Sir, are a mundanely omnipotent boil on the ass of humanity.
ce1bf67fe5d0
Can you prove a point without dissecting some one elses theories or re – writing some one elses code?? So much rework trouble just for a blog post ….
Excellent points, Piyush and Rakesh!
while most of the discussion here has been diverted to a lisp versus java debate, which I’d not like to get into, I’d like to point out three ‘facts’ (or most probable certainities, whatever):
1. a rewrite, or a major rehaul, would’ve been inevitable even if we had stuck to lisp – given the new needs of the business, which no one was in a position to anticipate in the first 2 years of launch. In fact, it would’ve been over-engineering if someone would’ve designed from that perspective back then.
2. we had moved from java to Rails while re-writing another core component of our system – the transaction manager – because Rails fulfilled all needs of the project – technical & non-technical.
3. Lisp is a *brilliant* language. I’d love to code all my future projects in it. The only thing it lacks is well tested libraries – especially for database access. It’s very frustrating when you spend 20% of your time extending a library or finding bugs in it, when you could’ve spent that time solving actual business problems!
Again, not saying that the core decision being debated in the comments is right or wrong, or whether it was rolled out as smoothly as it should have.
while most of the discussion here has been diverted to a lisp versus java debate, which I’d not like to get into, I’d like to point out three ‘facts’ (or most probable certainities, whatever):
1. a rewrite, or a major rehaul, would’ve been inevitable even if we had stuck to lisp – given the new needs of the business, which no one was in a position to anticipate in the first 2 years of launch. In fact, it would’ve been over-engineering if someone would’ve designed from that perspective back then.
2. we had moved from java to Rails while re-writing another core component of our system – the transaction manager – because Rails fulfilled all needs of the project – technical & non-technical.
3. Lisp is a *brilliant* language. I’d love to code all my future projects in it. The only thing it lacks is well tested libraries – especially for database access. It’s very frustrating when you spend 20% of your time extending a library or finding bugs in it, when you could’ve spent that time solving actual business problems!
Again, not saying that the core decision being debated in the comments is right or wrong, or whether it was rolled out as smoothly as it sho.uld have
@Saurabh, now I want to know more about point #1. I don’t know if you have the liberty to discuss this in public, but I sure am curious.
I’m really amused by the justification and reasonings. Let’s be honest.
Here is the reason for rewrite. Top brass never liked us for some obvious reasons. The kind of sub culture existing among programmers was hated. Some programmers didn’t like to be code monkeys. “Programming is for code monkeys” kind of attitude finally prevailed. Code monkeys were hired and given power. Performers were made to watch the comedy circus. Secret plans were made for rewrite. We were made to look like non-performers. We were forced in situation where there was only on option left.
Not a drunken rant. Further discussions are fruitless, useless, and utter waste of time.
PS: Mutable data structures are evil. Concurrency is almost impossible without immutability. Let’s rewrite everything in Clojure/Haskell/Erlang. All problems solved.
Cheers!
@chaitanya: do you think the old system would’ve been able to support the slew of launches without a re-write/major re-haul (in lisp itself)? Eg. Mobile, Cleartrip for business, Agent box, multi-city, new supplier integration, rate rule/discount/cashback manager, etc. I’ll accept whatever you say – your knowledge about that system was far greater than mine.
@Nikhil: I’m not sure it was as black & white as you make it seem in your comment. However, the change management could’ve been much better.
Haha! It all sounds very funny. Completely made up excuses, but funny nonetheless. While I agree with Saurabh in spirit, I disagree with what Hrush had to say. May be the earlier system was bad, but it was not because of Common Lisp (let’s not forget, that the current rewrite in Java is the rewrite of the same thing in Common Lisp which was in turn the rewrite/re-implementation of the same thing in Java too!). Languages don’t matter, people do. It’s sad that Cleartrip’s management was so myopic & naive that it picked the wrong battle and lost one of the best tech. teams ever assembled in an Indian company.
Anyway, I would let this matter rest. We all have moved on… let’s not dwell in the past anymore.
All the best, Hrush… now that your vesting schedule is over, may you get a great exit.
@nanda I too have little knowledge of actual working of air system. However I’d think that those new things are more like API consumers of the based system …aren’t they ? Discount etc are also auxiliary services to the base system.
@Nanda I don’t know if you want to discuss the details in public but essentially Piyush is right. Most of these would not require any changes to the core system (except for multi-city, work on which was already underway, no?). PM me if you want to know more.
@BG You hit the nail on the head. "It’s sad that Cleartrip’s management was so myopic & naive that it picked the wrong battle and lost one of the best tech. teams ever assembled in an Indian company."
What is perhaps saddest is the incredible amount of time and resources that has been wasted in re-writing systems which worked perfectly well. Rather than growing and nurturing what you already had (which includes both software and people), you choose to let them go and start from scratch again.
When I think about all the new stuff that you could have done with this time and these people, the waste is criminal.
Anyways, I heard the company is doing well these days. So good luck with the future.
Chaitanya: I think the primary reason why they needed to start afresh was that the initial tech. team was not a good fit with the company culturally. There we were, young guns, passionate about technology and building great products; and the kind of people that they were looking for (given the background of the top tech. management) was completely different. It was simply a cultural misfit. Tony Hsieh (Zappos) sold off his earlier startup LinkExchange to Microsoft because he realised (one fine day) that the company that he had built was full of people without the right culture; in our case, we were just a vocal minority which was expendable — it was much easier to just get rid of us than to start another company afresh. All the uneducated Common Lisp bashing, etc. is to justify their act where it’s completely unnecessary.
@nanda yes, I think we would have had to rewrite some parts of it .. the whole system had evolved over time anyway. Without detailed specs that was the only way it had to be done. But it would have been incremental development rather than a complete rewrite. The core systems were _very_ stable and performant. The db access issue is not such a deal as you make it out to be except for db intensive application with ORM requirements which all applications need not be. For such applications Rails was a better candidate and we did choose it at CT for such systems, did we not?
@BG In how many ways would you call the earlier system ‘bad’? It was a system designed with the minimalistic of specs in the shortest of times by a small team. As far as I remember two of the most important parameters given to me were performance and scalability. I think it was successful in those terms till the very end. CT was the fasted search engine by a huge margin and this was in no small way responsible for its success. The user experience was the best there was. I think it has admirable design considering the constantly changing requirements. over the years the system grew more than 20 times without any downtime or performance degradation. But I totally agree about the ‘best teams … ‘ bit.
@HB why bash a technology when what you really want to do is bash the people who used it for you ? Because whatever succeeded or failed was because of the people and not the technology. Why bad mouth stuff which was a critical part of your history (and success) ? Even though it ended in a rather bad way. Maybe blame the ending, or bad mouth the reasons. But history ?
There were 2 reasons why it all ended up as it did IMHO.
i. Expectation Management. both sides. The team was very good at their work but very bad at selling it. unfortunately there was no one to sell it for them.
ii. Hiring culturally opposite people to head the team. Note that I am just calling them culturally opposite, not bad or anything.
Because of those two things we end up from having a bunch of people working with missionary zeal on making CT the best product in the world to a point where most of them quit. In the Mahalaxmi office John Doerr had said "hire missionaries". Missionaries need a vision to work for and cannot see the vision diluted. At some point the management went back on its vision and made compromises and handed out vague justifications (like the ones in this blog post). They went and hired a lot of people who were more mercenaries. Maybe in the world of business it is not only justified but required to do all this. I am no one to judge weather it was good or bad choices. But in all this you have to manage your missionaries and your mercenaries separately and with care. A task not easy as things turned out.
So why blame anyone here ? Shit happened. Most of all why blame technologies ?
to add one more thing:
It was great fun working at CT. For all of us. Not only because of the tech team itself but also because of the rest of the people there. At least in the earlier days. The environment and mood were great. The founders were very involved and caring. It was a great place to work in. It was extremely painful to see it all break down for us – to see our hard work become irrelevant. But such is life I guess. I learned a lot. Of what to do and hopefully of what not to do. And the most important lesson I think was to find reasons (if at all) rather than to assign blame.
But even without us CT is doing very well (better, maybe) and my best wishes to them to keep it that way.
LOL! Not a single comment from the tech team. [Obviously I mean the current one!
]
asdf
Saurabh,
How can you compare the new system with the old system? If you need to keep in mind all the issues such cost, time and return on investment etc.. on comparison
Old system was built by few programmers 4-5 who were paid less than 50K as salary P.M. but the new system was built by a dozen of programmers (may be with 50K P.M.) + 4 V.P.(s) who are drawing more than 2 Lacs P.M. as salary+ dozen of QA Team whose salary will be on an average 50K P.M.
Let us compute the total cost,
Suppose old system took 3 years to build.
2,50,000 * 36 = 90,00,000 (90 Lacs) which is max
And new system took 2 years to build
(6,00,000 (Prog)+ 10,00,000 (V.P.)+ 5,00,000 (Q.A.)) * 24 = 5,04,00,000 (5 crores , 4 lakhs)
Company spent less than 1 Crore for old system and 5 Crores for the new system. Still the new system is just a java version of old LISP system. Same design, and architecture.
The old system evolved over a period of time by delivering many new features from time to time. But you have halted the whole product development for one year for re-writing the system.
Do you think, the new system will run for a decade? I bet it will not run for more than 3 years and you will be ended up with a scrap and useless people maintaining that scrap.
I don’t know how Mr.Hrush is missing this fact.It is really shame to see these nasty blogs by hiding the actual facts.
If you guys are really honest and professional, you should accept this truth.
@quasi +1 I can’t agree more.
Please also count the fact that instead of 16-20 servers before this one runs on 50!
Rewrite kills
Do you want to know the real truth?
JAVA killed Netscape.