Posts Tagged ‘Software Testing’

New Version Every Other Week for Three Years?

February 16, 2011

I get up every morning determined to both change the world and have one hell of a good time. Sometimes this makes planning my day difficult.

E. B. White
US author & humorist (1899 – 1985)

Releasing a working version to customers every two weeks is fun.

  • It is fun for customers who use the features instead of watching fictitious  “product road maps”.
  • It is fun for developers who see their work is actually used.
  • It is fun for the executives who can change the business priorities quickly.
  • If is fun for product managers who can measure actual usage.
  • It is fun for the R&D manager ,as the problems can not be hidden for long.

In my company,  we delivered 72 versions to customers  in three years.

Programming in the large and programming in th...

Image via Wikipedia

Here is one way to do it:

  • Hire top talent for development , QA , IT and operations.
  • Deliver the product as a Service (SaaS). Upgrading one instance is much easier than upgrading 10,000.
  • Bi weekly synchronization meetings on Monday and Thursday. Monday is just team leaders and Thursday is all of R&D.
  • Invest early in QA automation. We invested $20,000 in Automation infrastructure at a very early stage.
  • Invest in Unit-Testing as much as possible.
  • Avoid branching. Branches are evil. Merges are Yikes. One branch is good, two is max.
  • Invest in the “Ugly stuff”. Deployment scripts, upgrade scripts, database consistency.
  • Constructive dictatorship. Every code change  has a ticket. Every. No exceptions.Really.
  • First week is for coding. Than it is feature freeze. Three days for QA and bug fixes.Code freeze. Two days for final QA and critical fixes only. Release on Sunday.

In the next post I’ll try to answer the tricky questions: What about longer features? How not to scare the customers? and more.


A Case Study in Restore Nightmares

March 15, 2009

I’m learning to fly but I aint got wings
Coming down is the hardest thing

Well the good old days may not return
And the rocks might melt, and the sea may burn

Tom Petty

Read the first part of the story first.

Few hours have passed and the Sasha felt more secure about starting the recovery of ClearCase source files.  The initial restore was successful (300GB and 50Million files) and everyone was happy. Everyone but yours truly.

Me: How do you know the recovery was successful ?

Sasha: The recovery software says so.

Me: Well, we know what that’s worth. How do I know all the files are OK ?

Sasha : We will run FSCK on all file systems and see what happens.

Me: That’s a good start, but it still does not tell me which files we lost, what data is corrupted , and that no changes happened. And by the way, how long it will take to run FSCK ?

Sasha: These are excellent questions, I don’t have any idea. Let’s try and see what happens.

Backup and Restore Case Study
Backup and Restore Case Study

Running FSCK took a few more hours, but it seemed Ok. We tried to bring ClearCase up but it was not willing to. Surprisingly the restore process didn’t restore all the file permissions and soft links. These had to be added manually.

I was still unsure we didn’t lose all the information. With 300 modules, 20Million files and thousands of branches it is really hard  to know nothing was lost. It could take months before someone tries to build an old package and finds out it is missing. With databases there are multiple Integrity checks in place.With a Version Control system such as ClearCase it is not so simple.

When we called ClearCase guys (also IBM) it turned out there is a secret script that validates the internal consistency of the setup. We decided to run the script and at the same time build all the main products on the compile servers. If all the compilation results come out identical to what we have we can be quite confident we have at least the latest source available.

What we soon discovered is that we can’t do both at the same time. The hidden script was locking the files. We had to run the build after the script. Of course nobody in IBM could estimate how long the script would run .I sent all the developers home at this stage.

A Couple of hours later everything seemed to be in order. I went back home and prepared for the holy day. Yom Kipur is the day in the year when Jews are supposed to ask for forgiveness for the evil they did to their fellow men.It is also a day for reflection I felt this was very appropriate opportunity.

Lessons Learned:

  • A backup without frequent restore exercises is like a Pizza with no cheese
    • Just like High Availability never works in Passive Active mode
  • Everything starts from the requirements. IT is not different than R&D
    • Restore time is one example 🙂
  • If you can’t prove that the restore works you have a serious problem
    • This is a hard one. Think – what will make you sleep well at night.
  • When crisis happens, it is very nice to have a process in place.
    • Start from the phone list of key people
  • Trying to save money might cost you lots of money
    • There is usually a reason for high-end products
  • Trust no one
    • If they say they have backup. Ask them for the DRP plans
    • If they claim to have backup ask them to restore something

Strange Google Problem – False “This site may harm your computer” messages

January 31, 2009

It seems the Google main search engine has a critical problem.

It reports any site as being malicious, even when you just look for CNN.

See attached image. I validated on, , three different laptops,three different browser  and multiple IP locations. I assume it is not virus on my system.I wonder if the problem is global. Without Google it is hard to know what happens in the world.

google This site may harm your computer problem 2009

Google This site may harm your computer problem 2009

The Proof is in the Pudding- Stating the Obvious III

January 31, 2009

Contrary to what many programmers think, QA role is not to do the dirty work for them. QA’s role is to validate, independently, that the code actually works.

The reason I put the responsibility on the coder is simple. The coder is the one who writes the code, the one that understands it and the one that can change it. Why should anyone else be the owner ?

QA has a lot less options for proving the code works and reducing the risk than the developer, they can only test the functionality from a black box perspective.

Smart Software Developer using Virtual Lab Automation

Smart Software Developer using Virtual Lab Automation

The developer, on the other hand, has multiple options , beyond the ones already listed in part II.

  • Rewrite the code in a more modular fashion so it is easier to have unit tests
  • Move from c# to Python to make it easier to write mocks and do sub system testing
  • Add logs, alerts and assertions so he knows that edge conditions are safely handled
  • Refactor the code so User Interface validations and server validations use the same mechanism
  • Add new code with a separate flag\object\screen so it has less chance to have regression on other functionality
  • Shout at the product manager that the requirements are too complex and there is not way to implement them In SQL with proper testing
  • Move from simple ASP.NET mode to MVC model so more parts of the UI can be tested separately
  • Ask QA to help with extensive PRE-COMMIT manual testing as part of the development stage
  • Ask QA to help with running the automatic testing on development branches
  • Help the  Automated QA team  to make sure new features are tested during the development stage and not post deployment

The manager role is:

  • Iterate over and over the concept of ownership, proof and responsibility
  • Back the theory with resources – buy machines for testing, software for code checking etc
    • For example, buying two servers for the clustering team so they can test their code actually runs on a cluster
  • Help to manage trade-offs and real world considerations
    • For example, which functionality is used a lot and which is hardly used
  • Pay the “price” for making higher quality code
    • For example, Pay $50,000 for a new automated testing project
  • Avoid being dogmatic in the specific methodology
    • For example, unit testing might not be effective in certain places and forcing everyone to do them will just create resentment
  • Introduce and promote new technologies such as Virtualization and lab automation
  • Help apply the right methods in the right context
    • The JavaScript testing framework is great, but should we implement it right now ?

To summarize, like any other professional, the developer is the one responsible for the quality of his or her work.Allowing them to push unproven code to customers is what gave us bad reputation as an industry.However, the best ones are able not just to code, but also to analyze the risk, check for validity ,rewrite and design to create bullet proof products.

And if you read so far, here is a reminder to a lovely 80’s song.

Evident Based Coding – Stating the Obvious II

January 31, 2009

Continuing from the previous post let us check why would one pay a developer who can’t prove his code is working.

When I was in Check Point I used to have weekly pseudo-random interviews with employees. It is a great habit that I learned from Dorit Dor, my manager at the time.

When you manage 200+ employees it is one of the only ways to get direct feedback and stay in touch with what’s really going on, but it turns out to be a good idea even when there is only a single employee reporting to you.

One of my favorite questions to developers was :

How do you KNOW the code you produce really works ?

Amazingly, this pretty simple question had all of them surprised.

The university graduate, the PHD, the autodidact  , the hacker, the PC kid and even the group manager. None were prepared for this question.

The Surprised Software Developer

The Surprised Software Developer

The common answers were :

  • I don’t know it works, but I have a good feeling about it
  • It works most of the time
  • QA will test it and than I’ll know it works
  • I tried it a bit and it looks fine
  • I did a code review with my team leader and he approved it
  • It is a small change and I’m confident in it
  • There is no way to do it in the time I was given

As you can imagine, I was not very happy with most of these answers. Here are some of the best developers in the world, with five time the salary of a social worker with 30 years experience, and they can’t explain why their work is actually ,hmm, working.

My belief is that the Developer has to PROVE ,to a reasonable degree, that code he commits is working as planned and does not break other code.

If he can’t do it , he should not be committing the code to the general working branch.

How can the poor programmer achieve this goal:

  • Running and writing unit tests for his code and running them
  • Writing sub system tests for his code and running them
  • Using code checking tools, looking for warning, errors and suggestions
  • Asking peers for a code review
  • Going through the design and requirement and validating actual code implements them
  • Manually working with the system and going through all scenarios he claims to support
  • Spending couple of hours trying to come up with all the extreme cases and special problems
  • Going over the QA test design and making sure his code will pass the tests

And by the way, if all these methods are not available \ reliable or feasible it is also OK to commit the code if the developer EXPLICITLY lets everyone know the status before and gets the managers approval.

I want to commit the new screen, but I never tested it on FireFox and there for I assume it does not work on FireFox. Is it OK to commit ? I also didn’t test the sorting or the client side validation, but I think they might work because I didn’t touch this code and it is very solid

Obviously , developers are notoriously over optimistic  so this should be kept as a last respot, but making them say it out loud is key to maintaing high level of professionalism and ownership. More on this in the next post.

Product Management in the Real World -“The Divider” Case Study, Part II

February 22, 2008

There are scratches all around the coin slot
Like a heartbeat, baby trying to wake up,
But this machine can only swallow money.
You cant lay a patch by computer design.
Its just a lot of stupid, stupid signs.

Two weeks later Ron sent an initial version to QA. The testers vigorously began opening bugs with hilarious titles: “Nothing Works!”, “GUI Crashes Every 46 Seconds” ,”Spelling Mistakes in Non Existent Help Screens”. The coders raged about QA’s inability to overcome transitory hiccups, and silently ran to fix the problems.  

An improved version with most of new features was deployed to QA after a two month delay. Surprisingly, it turned out the old, “existing” features the divider was supposed to expose are hardly working. Since they had no user interface the testers “forgot” to check them.  It seemed customers were not using the protections either.

The default setting for the innovative protections was set to off, as it raised too many false alarms.  Since there was no visible way to turn them on, only the most advanced and innovative, paranoid customers implemented the protections.

To make things worse, no support tickets were open as well, and the CMO was convinced the product is top notch.

Sigourney shouted at Ron:”How can you provide a product that’s not working? Did you ever test it yourself before deploying to QA? “Ron, who wasn’t the quiet type, responded: “I own the SQL Guardian” that works smoothly. The “Data Crusher” was written by the company founder five years ago and you can talk to him about it. I did not join this company to be a code monkey. You are throwing undefined tasks at me, stealing Mark for other projects and then wonder why things break.  I will not stand this hypocrisy”.


Rumors of the problems reached Oberon, the director. He moved three additional developers to help the project. Although Ron felt the project is running out of control, bugs were fixed at a much higher rate. The director ran a daily status meeting to monitor the development and reprioritize trivial bugs. He kept the team confident :”Microsoft ships with many bugs and they still rule the world”,” In a 1.0 version  customers are forgiving for minor problems”.

The marketing department published a passionate release note regarding the innovative new concept TASP security will present in. The stock rose and the sales team was energized. The entire R&D helped and people worked around the clock. Following three months of intense work, a Go-No-Go dissuasion was held with QA, R&D and product management.

QA felt the product is not mature enough, but the rest of the team ignored them. There wasn’t a single product they ever approved, not even the successful “Knowledge keeper” .The exhausted Sigourney felt the product is ready and people got tired of the repeating delays. Five months later than the original plan, the pressure was mounting to go ahead and release.  Ron was the only opposition, and refused to be responsible for the results. Oberon considered all the options and decided to ship. To comfort Ron all the limitations will be listed in a ten page long release notes paper.

Stay tuned for Part III – The Customers.