Posts Tagged ‘OCR’

Economical Value of Reading – Porn,SPAM,OCR and Mechnical Turks

August 16, 2008

Gigaom has a nice write-up on using Captcha for improving OCR accuracy.

In essence , the service is adding a second Captcha word in order to make humans fix OCR recognition errors.

Few observations regrading the idea:

1. In Amazon’s mechanical Turk one would get $0.04 for doing human Optical Character Recognition.Why should one do it for free ?  Just to login to Craig’s list ? How long does it take before people realize they can skip the second word ?

2. It has been known that spammers used to put webmail cpatch’es in front of porn sites to automate account creation. We can draw the following non proven equation :

Webmail account for Spamming = Porn Site Login Credentials = Reading a Distorted Word = 4 Cents

3.   The world record for speed reading is 4251 words per minute. This means she could make 170$ per minute , if only she could read them that fast ( 25,000 words per minute ), but is seems voice recognition would fail here. Back to square one.

4. OCR has been stuck for many years waiting for NLP to evolve. Did spammers really imporve it that much ? I know I have a big problem reading many of the Captcha these days.

Seems that grid and passion can spark innovation.

Can you make money writing algorithms ? Part I

December 21, 2007

Just a few years ago it didn’t really pay off to be an algorithm focused company or an expert. Life was much better for a company that made pretty straight forward   database-office-automation-process-improvement software. Finally, this is about to change and the really smart people of software programming might start seeing money flowing their way.

Historically, writing sophisticated algorithms didn’t result in great product, system or reward. One of my good friends, who is also one of the brightest guys I know, tried once to sell a great machine learning algorithm.He devoted 5 years of his life to writing it. The best offer he ever heard was 10,000$. That’s more-or-less 200 hours worth of an expert HTML developer in San Francisco , only that this is one of the smartest people in computer science and he spent some 12,000 hours on it.

Things were not much better for others. Let’s try a social game. Name 5 famous Belgians ! Sorry, wrong game.  Name 5 famous algorithms that made their inventors rich. Don’t Peek.

1.       RSA – There is some evidence that considerable financial reward did arrive to some of the inventors.

2.       LZW - Rumor has it that Compuserve made some money out of it.

3.       Can someone Help ? One-Click-Shopping? Window Keyboard Button?  T9 ?

To make things worse even the smart companies didn’t do very well. Take a look the OCR market. There used to be quite a few companies that made great algorithms for image processing and pattern recognition : Ceare, Calera, Xerox, ScanSoft. They are so forgotten you can’t even find most of them on the web.

Today, almost none of them exist and you probably never heard of them, if you are outside of the field. Voice recognition companies didn’t do much better. Dragon, Art,Phonetic Systems. No one was really able to get really huge revenues and they all folded into one company – Nuance.

There are quite a few good reasons for that. First of all, people buy products , services or full solutions. They are not really interested in algorithms. Usually the people who create great algorithms are not so great in product management or system engineering. Google without the text-ad concept and great engineering of scalable data centers would be left with another great information retrieval algorithm.

Second, there are many times in which a better algorithm is less important than implementation and context. In computer science it can mean a lot if something takes N operations to compute or If it takes 2N operating to compute. In real life this is often overshadowed by other considerations such as memory, IO, or just waiting for the users input. Moreover, algorithms need context. Data mining for  shopping is different that basketball analytics. Understanding the unique features of each one involves a lot of domain expertise.

To make matters worse for our algorithm genius, even in places where algorithm can save lots of money such as compression, communication and encryption the trend in recent years has been standardization.
With cellular (GSM) & Security  (AES) innovation being set by standard committees, the potential to make a difference lies more in the implementation of the algorithm, rather than its invention.

In so many words, if you wanted to spend your life writing cool algorithms, you would probably need to focus on academic life or settle for nice day job in medical, communications or defense industry.In next part – Why is it all changing ?


Follow

Get every new post delivered to your Inbox.

Join 130 other followers