Marketing meet Big Data, call records, credit card purchases & demographics

Read an article in Science Daily (Understanding urban issues through credit cards) that talked about a study published in Nature (Sequences of purchases in credit card data reveals lifestyles of urban populations) that applies big data to B2C marketing.

The researchers examined call data records (CDRs), credit card transactions records (CCRs) and demographic (age, sex, residential zip code, wage level, etc.) data and did a cross table between them to identify sequences of purchases. They then used these sequences to identify different lifestyle groups in the urban area.

Marketing 2.0

The analyzed data from Mexico City, Mexico. The CCR data was collected for 10 weeks across 150K users. The had CDR data for 1/10th of the users for 6 months surrounding the 10 weeks duration. Credit card adoption is still low in Mexico (18%), so the analysis may be biased.  When thy matched CCR expenditures against median wages in a district and they found their participants came from higher wage populations. Their data also spanned all districts within the city.

The analysis identified sequences of purchase categories as well as expenditures.  They characterized purchase sequences as “words”.

 

 

 

Using the word data and further statistical analysis they were able to split the population up into 5 distinct lifestyle groups. 

The loops of icons above represent major purchase categories derived from the CCR data merchant category codes (MCC).  Each of the rings in “a” above show the same 12 major MCC purchase categories. If you look at each ring, one can identify a central or core node that seems to have the most incoming or outgoing arks. These seem to be the central purchases made by that lifestyle group after which they branch out to other purchase categories.

There are five different lifestyle categories (they also show the city average) delineated in the data:

  • Commuter – generally they have to pay tolls, have longer travel between home and work and have a diverse sequence of purchase that occurs after purchases from the toll category.
  • Household – purchases seem to center on grocery stores/supermarkets and then branch off from there.
  • Young – purchases seem to center on the taxicab category and then go to computer-networking, restaurants, grocery stores/supermarkets.
  • Hi-Tech – purchases seem to center on computer-networking,  then go to gas stations, grocery stores/supermarkets, restaurants, and telecomm.
  • Average – seems to have two focuses grocery stores/supermarkets and restaurants and then goes out from there to gas stations, specialty food stores and department stores.
  • Dinner-out – purchases seem to center on restaurants and then branch out fro there to computer-networking, gas stations, supermarkets, fast food, etc.

In “b”  breakout above, you can see the socio-demographic characteristics of each lifestyle group as compared with the median user. And in “c” one can see some population histograms of the demographic data.

They were then able to use the CDR data to construct a map of which lifestyle called which other life style to identify call correlation data. Most calls were contacts between the same groups but the second most active call was calls to householders.

They took this same analysis to another city in Mexico and came up with six  lifestyle categories, five of the same and a different one.

~~~~

When I went to Uni (a long long time ago), I attended an urban geography class that was much more scientific and mathematical than any other geography class I had ever attended. I remember asking the professor when did geography become an exact science. As best as I can recall, he laughed and said over the last decade.

Analysis like the above could make B2C marketing, almost an exact science.

Big Data meet Marketing – Buyer beware.

Comments?

Photo Credit(s):  All charts/photos are from the Nature article Sequences of purchase in credit card data reveal lifestyles in urban populations

AI’s Image recognition success feeds sound recognition improvements

I must do reCAPTCHA at least a dozen times a week for various websites I use. It’s become a real pain. And the fact that I know that what I am doing is helping some AI image recognition program do a better job of identifying street signs, mountains, or shop fronts doesn’t reduce my angst.

But that’s the thing with deep learning, machine learning, re-inforcement learning, etc. they all need massive amounts of annotated data that’s a correct interpretation of a scene in order to train properly.

Computers to the rescue

So, when I read a recent article in MIT News that Computers learn to recognize sounds by watching video, I was intrigued. What the researchers at MIT have done is use advanced image recognition to annotate film clips with the names of things that are making sounds on the film. They then fed this automatically annotated data into a sound identifying algorithm to improve its recognition capability.

They used this approach to train their sound recognition system to be  able to identify natural and artificial sounds like bird song, speaking in crowds, traffic sounds, etc.

They tested their newly automatically trained sound recognition against standard labeled sound sets and was able to categorize sound with a 92% accuracy for a 10 category data set and with a 74% accuracy with a 50 category dataset. Humans are able categorize these sounds with a 96% and 81% accuracy, respectively.

AI’s need for annotation

The problem with machine learning is that it needs a massive, properly annotated data set in order to learn properly. But getting annotated data takes too long or is too expensive to do for many things that we want AI for.

Using one AI tool to annotate data to train another AI tool is sort of bootstrapping AI technology. It’s acute trick but may have only limited application. I could only think of only a few more applications of similar technology:

  • Use chest strap or EKG technology to annotate audio clips of heart beat sounds at a wrist or other appendage to train a system to accurately determine pulse rates through sound alone.
  • Use wave monitoring technology to annotate pictures and audio clips of sea waves to train a system to accurately determine wave levels for better tsunami detection.
  • Use image recognition to annotate pictures of food and then use this train a system to recognize food smells (if they ever find a way to record smells).

But there may be many others. Just further refinement of what they have used could lead to finer grained people detection. For example, as (facial) image recognition gets better, it’s possible to annotate speaking film clips to train a sound recognition system to identify people from just hearing their speech. Intelligence applications for such technology are significant.

Nonetheless, I for one am happy that the next reCAPTCHA won’t be having me identify river sounds in a matrix of 9 sound clips.

But I fear there’s enough GreyBeards on Storage podcast recordings and Storage Field Day video clips already available to train a system to identify Ray’s and for sure, Howard’s voice anywhere on the planet…

Comments?

Photo Credit(s): Wave by Matthew Potter; Waves crashing on Puget Sound by mikeskatieDay 16: Podcasting by Laura Blankenship