AI’s Image recognition success feeds sound recognition improvements

I must do reCAPTCHA at least a dozen times a week for various websites I use. It’s become a real pain. And the fact that I know that what I am doing is helping some AI image recognition program do a better job of identifying street signs, mountains, or shop fronts doesn’t reduce my angst.

But that’s the thing with deep learning, machine learning, re-inforcement learning, etc. they all need massive amounts of annotated data that’s a correct interpretation of a scene in order to train properly.

Computers to the rescue

So, when I read a recent article in MIT News that Computers learn to recognize sounds by watching video, I was intrigued. What the researchers at MIT have done is use advanced image recognition to annotate film clips with the names of things that are making sounds on the film. They then fed this automatically annotated data into a sound identifying algorithm to improve its recognition capability.

They used this approach to train their sound recognition system to be  able to identify natural and artificial sounds like bird song, speaking in crowds, traffic sounds, etc.

They tested their newly automatically trained sound recognition against standard labeled sound sets and was able to categorize sound with a 92% accuracy for a 10 category data set and with a 74% accuracy with a 50 category dataset. Humans are able categorize these sounds with a 96% and 81% accuracy, respectively.

AI’s need for annotation

The problem with machine learning is that it needs a massive, properly annotated data set in order to learn properly. But getting annotated data takes too long or is too expensive to do for many things that we want AI for.

Using one AI tool to annotate data to train another AI tool is sort of bootstrapping AI technology. It’s acute trick but may have only limited application. I could only think of only a few more applications of similar technology:

  • Use chest strap or EKG technology to annotate audio clips of heart beat sounds at a wrist or other appendage to train a system to accurately determine pulse rates through sound alone.
  • Use wave monitoring technology to annotate pictures and audio clips of sea waves to train a system to accurately determine wave levels for better tsunami detection.
  • Use image recognition to annotate pictures of food and then use this train a system to recognize food smells (if they ever find a way to record smells).

But there may be many others. Just further refinement of what they have used could lead to finer grained people detection. For example, as (facial) image recognition gets better, it’s possible to annotate speaking film clips to train a sound recognition system to identify people from just hearing their speech. Intelligence applications for such technology are significant.

Nonetheless, I for one am happy that the next reCAPTCHA won’t be having me identify river sounds in a matrix of 9 sound clips.

But I fear there’s enough GreyBeards on Storage podcast recordings and Storage Field Day video clips already available to train a system to identify Ray’s and for sure, Howard’s voice anywhere on the planet…

Comments?

Photo Credit(s): Wave by Matthew Potter; Waves crashing on Puget Sound by mikeskatieDay 16: Podcasting by Laura Blankenship

Crowdsourcing made better

765140960_735722ddf8_zRead an article the other day in MIT News (Better wisdom from crowds) about a new approach to drawing out better information from crowdsourced surveys. It’s based on something the researchers have named the “surprising popularity” algorithm.

Normally, when someone performs a crowdsourced survey, the results of the survey are typically some statistically based (simple or confidence weighted) average of all the responses. But this may not be correct because, if the majority are ill-informed then any average of their responses will most likely be incorrect.

Surprisingly popular?

10955401155_89f0f3f05a_zWhat surprising popularity does, is it asks respondents what they believe will be the most popular answer to a question and then asks what the respondent believes the correct answer to the question. It’s these two answers that they then use to choose the most surprisingly popular answer.

For example, lets say the answer the surveyors are looking for is the capital of Pennsylvania (PA, a state in the eastern USA) Philadelphia or not. They ask everyone what answer would be the most popular answer. In this case yes, because Philadelphia is large and well known and historically important. But they then ask for a yes or no on whether Philadelphia is the capital of PA. Of course the answer they get back from the crowd here is also yes.

But, a sizable contingent would answer that the capital of PA is  Philadelphia wrong (it is actually Harisburg). And because there’s a (knowledgeable) group that all answers the same (no) this becomes the “surprisingly popular” answer and this is the answer the surprisingly popular algorithm would choose.

What it means

The MIT researchers indicated that their approach reduced errors by 21.3% over a simple majority and 24.2% over a confidence weighted average.

What the researchers have found, is that surprisingly popular algorithm can be used to identify a knowledgeable subset of individuals in the respondents that knows the correct answer.  By knowing the most popular answer, the algorithm can discount this and then identify the surprisingly popular (next most frequent) answer and use this as the result of the survey.

Where might this be useful?

In our (USA) last election there were quite a few false news stories that were sent out via social media (Facebook and Twitter). If there were a mechanism to survey the readers of these stories that asked both whether this story was false/made up or not and asked what the most popular answer would be, perhaps the new story truthfulness could be completely established by the crowd.

In the past, there were a number of crowdsourced markets that were being used to predict stock movements, commodity production and other securities market values. Crowd sourcing using surprisingly popular methods might be used to better identify the correct answer from the crowd.

Problems with surprisingly popular methods

The one issue is that this approach could be gamed. If a group wanted some answer (lets say that a news story was true), they could easily indicate that the most popular answer would be false and then the method would fail. But it would fail in any case if the group could command a majority of responses, so it’s no worse than any other crowdsourced approach.

Comments?

Photo Credit(s): Crowd shot by Andrew WestLost in the crowd by Eric Sonstroem

 

Domesticating data

4111674475_76be20e180_zRead an article the other day from MIT News (Taming Data) about a new system that scans all your tabular data and provides an easy way to query all this data from one system. The researchers call the system the Data Civilizer.

What does it do

Tabular data seems to be the one constant in corporate data (that and for me PowerPoint and Word docs). Most data bases are tables of one form or another (some row and some column based). Lots of operational data is in spreadsheets (tables by another name) of some type.  And when I look over most IT/Networking/Storage management GUIs, tables (rows and columns) of data are the norm.

156788318_628fb0e4dc_oThe Data Civilizer takes all this tabular data and analyzes it all, column by column, and calculates descriptive characterization statistics for each column.

Numerical data could be characterized by range, standard deviation, median/average, cardinality etc. For textual data a list of words in the column by frequency might suffice. It also indexes every  word in the tables it analyzes.

Armed with its statistical characterization of each column, the Data Civilizer can then generate a similarity index between any two columns of data across the tables it has analyzed. In that way it can connect data in one table with data in another.

Once it has a similarity matrix and has indexed all the words in every table column it has analyzed, it can then map the tabular data, showing which columns look similar to other columns. Then any arbitrary query for data, can be executed on any table that contains similar data supplying the results of the query across the multiple tables it has analyzed.

Potential improvements

The researchers indicated that they currently don’t support every table data format. This may be a sizable task on its own.

In addition statistical characterization or classification seems old school nowadays. Most new AI is moving off statistical analysis to more neural net types of classification. Unclear if you could just feed all the tabular data to a deep learning neural net, but if the end game is to find similarities across disparate data sets, then neural nets are probably a better way to go. How you would combine this with brute force indexing of all tabular data words is another question.

~~~~

In the end as I look at my company’s information, even most of my Word docs are organized in some sort of table, so cross table queries could help me a lot. Let me know when it can handle Excel and Word docs and I’ll take another look.

Photo Credit(s): Linear system table representation 2 by Ronald O’ Daniel

Glenda Sims by Glendathegood