Lean Publication

Let’s assume that you want to publish a database on Internet. It can be bunch of interesting documents, a specific research subject, a list of job offers, documentation ... anything. So you need some kind of database (an RDBMS, or a set of files and index - lets call it store), a publication engine (offering search, navigation, print format...) and a website. And you want it stable and scalable.
So you double your infrastructure. Instead of having one computer running the database and the engine you take two identical configurations and load balancing on top and probably separate the store and the website from the engine.
But you haven't finished yet; it is not enough to redirect users between servers if the load is too high. An error can occur and the application may collapse. In that case it's not enough to redirect the user to the other computer but you also have to take over the session and other user information. So you construct a synchronization mechanism. And maybe you should keep your servers at different location...
So now you have a complex infrastructure which will have naturally significantly more errors and is more expensive to develop, implement and operate. And you spend your time on hiding these problems instead of investing in better functionality. What is probably more important the application stack gets deeper and your customers have to wait longer...

In the Lean Six Sigma way of thinking complexity and overcapacity is a waste and should be avoided. Does it mean we have to risk system instability or overload? Or can we learn something from the manufacturer who pioneered Lean production Toyota?

On of my favorite books is "The Toyota Way" about the management principles of Toyota. On page 128-129 there is an excellent story which for me is a mind changer. In the traditional way of thinking we do our best for avoiding errors. (I gather that in a car plant it means permanent operation). The suggestion of the Japanese manager is shocking: problems are there if we like it or not. Let them surface! If they do we can correct them and be better than before.

Back to the website doest it mean that we have to let the site to break? Naturally not, Toyota also tries hard to avoid its cars to break down (but they don't build double petrol engines in a sedan either). But I say that we should monitor our system and do stress test to find and remove problems and bottlenecks. The result will be a simple, fast and high performance service. Will it be reliable? A double system may seem more error prone but all this safeguards which should protect us may fail in critical situation. And if one of several identical servers has an error under certain circumstances the others server may also have them under the very identical circumstances.
Scalability? Our Lean system is much more powerful. Depending on the kind of audience e we design on a typical load or for pick time, but not for extra situations. A flexible system can detect and tackle overload (e.g. with simplifying time consuming task or excluding some "late" users) and bounce back if needed. And yes we also may use simple load balancing if needed. What matters is the attitude to problems and solutions.

And what if that we are as lean as possible but the performance still can't cover demand? We need to scale, needn't wee? Another Lean expression springs into mind; cell production. With cell production workers are grouped into teams, each team working on a different part of the production. The same is applicable for our website; we can part the website into subparts (e.g. search, portal, presentation, printing, store and retrieve...). The subsystems communicate through simple interfaces and they can be installed on the same computer or on different computers according to the needs of the application. This gives greater flexibility and higher performance. Identifying week points is easy.




No comments:

Powered By Blogger