The Hardest Bugs in The World – Part Two

It seems our mysterious IronPython memory leak is actually a “Handle Leak”. These are even harder to detect than memory leaks.It seems some of our “Future” is leaking.  I guess we’ll have to patch a hole in the time-space continuum to fix it.

In other words, when we create a “Future” we use one event and one mutex ( or Mutent in windows lingo) and they just keep piling up. It all sounds like a Sci-Fi B Movie. Read all about it, if you have trouble falling asleep.

On the database side things are just as challenging. It seems next to impossible to get determinism out of our  SQL Server 2008. We are trying to understand why the results in our production database , our QA database and our performance setup are sometimes completely different.

It is pretty annoying when the same query lasts two seconds in one place and thirty seconds in the second location, while the database tables are identical.

I would have expected this type of problems to go away in 2010. Here are a few insights (credit goes to Roy R).

  • Increasing the number of CPUs  can decrease performance when running on VMWARE, because VMWARE requires all four CPUs to be free which happens less often than having just two CPUs free.
  • Why would a query run slowly inside the web app, but quickly when ran inside the SQL Management Studio on the same database?
    Why would a query running slowly suddenly start running quickly on the same web app?
    Sounds like Voodoo? Probably, but the answers may lie in the SQL plan cache.
  • Q: Why would a query run slowly inside the web app, but quickly when ran inside the SQL Management Studio on the same DB?
    A: The two queries may be using different query plans because of different text, parametrization or connection settings. The old query plan has become obsolete.
  • Q: Why would a query running slowly suddenly start running quickly on the same web app?
    A: The query plan could have got refreshed. Changes to the table (updates, deletes or inserts) can cause an automatic statistics update. Also the plan could be retired to free memory after a while.
  • Different queries produce different plans. Text matters and parametrization matters.
  • It is quite hard to “freeze” the query plan, since it requires a lot of memory and there are too many variations

This is an interesting variation of Heisenberg_Uncertainty_Principle when trying to measure the performance, changes the statistics and therefore changes the measurement. This is also known as Heisenbug. We are open to creative ideas. Till than I’m considering trying out London’s first pensioners’ playground.

[picapp align=”none” wrap=”false” link=”term=determinism&iid=8853721″ src=”a/e/a/2/Londons_First_Pensioners_cd8d.jpg?adImageId=13105597&imageId=8853721″ width=”380″ height=”588″ /]

Tags: , , , ,

One Response to “The Hardest Bugs in The World – Part Two”

  1. Yaniv Says:

    “Why would a query run slowly inside the web app, but quickly when ran inside the SQL Management Studio on the same database?” – because it doesn’t go through MS DTC?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: