Learning machine learning – part 1

Saw an article this past week from AWS Re:Invent that they just released their Machine Learning curriculum and materials  free to the public. Google (Cloud Platform and elsewhere) TensorFlow,  (Facebook’s) PyTorch, and Microsoft Azure CNTK frameworks  education is also available and has been for awhile now.

My money is on PyTorch and Tensorflow as being the two frameworks most likely to succeed. However all the above use many open source facilities and there seems to be a lot of cross breeding across them. Both AWS ML solutions and Microsoft CNTK offer PyTorch and TensorFlow frameworks/APIs as one option among many others.  

AWS Machine Learning

I spent about an hour plus looking over the AWS SageMaker tutorial videos in the developer section of AWS machine learning curriculum. Signing up was fairly easy but I already had an AWS login. You also had to enroll/register for the course on your AWS login  but once that was through, you could access courses.

In the comments on the AWS blog post there were a number of entries indicating broken links and other problems but I didn’t have any issues. Then again, I didn’t start at the beginning, only looked at over one series of courses, and was using the websites one week after they were announced at Re:Invent.

Amazon SageMaker is an overarching framework that can be used to perform machine learning on AWS, all the way from gathering, analyzing and modifying the dataset(s), to training the model, to creating a inference engine available as an endpoint that can be used to perform the inferencing.

Amazon also has special purpose API based tools that allow customers to embed intelligence (inferencing) directly into their application, without needing to perform the ML training. These include:

  • Amazon Recognition which provides image (facial and other tagging) recognition services
  • Amazon Polly which provides text to speech services in multilple languages, and
  • Amazon Lex which provides speech recognition technology (used by Alexa) and together with Polly helps embed conversational interfaces into customer applications.

TensorFlow Machine Learning

In the past I looked over the TensorFlow tutorials and recently rechecked them out. I found them much easier to follow this time.

 

The Google IO 2018 video on TensorFlowGetting Started With TensorFlow High Level APIs, takes you through a brief introduction to the Colab(oratory),  a GCP solution that uses TensorFlow and how to use Tensorflow Keras, tf.data and TensorFlow Eager Execution to create machine learning models and perform machine learning.

 Keras on TensorFlow seems to be the easiest approach to  use machine learning technologies. The video spends most of the time discussing a Colab Keras code element,  ~9 lines, that loads a image classification dataset, defines a 1 level (one standard layer and one output layer), trains it, validates it and uses it to perform  inferencing.

The video also touches a bit on tf.data and TensorFlow Eager Execution but the main portion discusses the 9 line TensorFlow Keras machine learning example.

Both Colab and AWS Sagemaker use and discuss Jupiter Notebooks. These appear to be an open source approach to documenting and creating a workflow and executing Python code automatically.

GCP Colab is essentially a GCP-Google Drive based Jupiter notebook execution engine. With Colab you create a Jupiter notebook on google drive and interactively execute it under Colab. You can download your Jupiter notebook files and essentially execute them anywhere else that supports TensorFlow (that supports TensorFlow v1.7 or above, with Keras API support).

In the video, the Google IO   instructors (Josh Gordon and Lawrence Moroney) walk you through building a model to recognize handwritten digits and outputs a classification (0..9) of what the handwritten digit represents.

It uses a standard labeled handwriting to digits labeled data set, called the MNIST database of handwritten digits that’s already been broken up into a training set and a validation set. Josh calls this the “Hello World” of machine learning.

The instructor in the video walks you through the (Jupiter Notebook – Eager Execution-Keras) code that inputs the data set (line 2), builds a 1 level (really two layer, one neural net layer and one output layer) neural network model (lines 3-6), trains the model (line 7), tests/validates the model (line 8) and then uses it to perform an inference (line 9).

Josh spends a little time discussing neural networks and model optimizations and some of the other parameters used in the code above. He has a few visualizations of what this all means but for the most part, the code uses a simple way to build a neural net model and some standard optimization techniques for the network.

He then goes on to discuss tf.data which is an API that can be used to create machine learning datasets and provide this data to the neural net for training or inferencing.  Apparently tf.data has a number of nifty features that allow you to take raw data and transform it into something that can be used to feed neural nets. For example, separating the data into batches, shuffling (randomizing) the batches of data, pre-fetching it so as to not starve the GPU matrix multipliers, etc.

Then it goes into how machine learning is different than regular coding. And show how TensorFlow Eager Execution is really just like Python execution. They go through another example (larger) of machine learning, this one distinguishes between cats and dogs. While they use an open source Python IDE ,  PyCharm, to test and walk through their TF Eager Execution code, setting breakpoints and examining data along the way.

At the end of the video they show a link to a Google crash course on TensorFlow machine learning and they refer to a book Deep Learning with Python by Francois Chollet. They also mention a browser version of TensorFlow which uses Java Script and  your browser to develop, train and perform inferences using TensorFlow Keras machine learning.

~~~~

Never got around to Microsoft’s Azure training other than previewing some websites but plan to look over that soon.

I would have to say that the Google IO session on using TensorFlow high level APIs was a lot more enjoyable (~40 minutes) than the AWS multiple tutorial videos (>>40 minutes) that I watched to learn about SageMaker.

Not a fair comparison as one was a Google IO intro session on TensorFlow high level APIs and the other was a series of actual training videos on Amazon SageMaker and the AWS services you can use to take advantage of it.

But the GCP session left me thinking I can handle learning more and using machine learning (via TensorFlow, Keras, Eager Execution, & tf.data) to actually do something while the SageMaker sessions left me thinking, how much AWS facilities and AWS infrastructure services,  I would need to understand and use to ever get to actually developing a machine learning model.

I suppose one was more of an (AWS SageMaker) infrastructure tutorial  and the other was more of an intro into machine learning using TensorFlow wherever you wanted to execute it.

I think I’m almost ready to start creating and feeding a TensorFlow model with my handwriting and seeing if it can properly interpret it into searchable text. If it can do that, I would be a happy camper

Comments…

Photo credits: 

Screenshos from AWS Sagemaker series of tutorial video 1, 2, 3, 4 & 5, you may need a signin to view them

Screenshots from the Getting Started with TensorFlow High Level APIs YouTube video 

UK Biobank & the data economy – part 2

A couple of weeks back I wrote a post about repositories for all the data that users generate these days and what to do with it.  (See our post on Data banks, deposits, … data economy – part 1).

This past week I read an article (see ScienceDaily Genetics of brain structure … article) which partially exemplifies what that post talked about. The research used publicly available genetic information to tease out brain structure hereditary characteristics.  The Science Daily article was a summary of research done at the University of Oxford using information provided from the UK Biobank.

Biobank as a data bank

The Biobank has recruited 500K participants from the UK,  aged 40-69,  between 2006-2010, to share their anonymized health data with researchers and scientist around the world. The Biobank is set up as a Scottish charity, funded by various health organizations in UK both gov’t and private. 

In addition to information collected during the baseline assessment: 

  • 100K participants have worn a 24 hour health monitoring device for a week and 20K have signed up to repeat this activity.
  • 500K participants are providing have been genotyped (DNA sequencing to determine hereditary genes)
  • 100K participants will be medically scanned (brain, heart, abdomen, bones, carotid artery) with images stored in the Biobank
  • 100K participants have signed up to receive questionnaires asking  about diet, exercise, work history, digestive health and other medical indicators..

There’s more. Biobank is linking to electronic health records (EHR) of participants to track their health over time. The Biobank is also starting to provide blood analysis and other detailed medical measures of subjects in the study.

UK Biobank (data bank) information uses

“UK Biobank is an open access resource. The Resource is open to bona fide scientists, undertaking health-related research that is in the public good. Approved scientists from the UK and overseas and from academia, government, charity and commercial companies can use the Resource. ….” (from UK Biobank scientists page).

Somewhat like open source code, the Biobank resource is made available to anyone (academia as well as industry), that can make valid use of its data BUT any research derived from its data must be published and made freely available to the Biobank and the world.

Biobank’s papers page documents some of the research that has already been published using their data. It lists the paper on genetics of brain study mentioned above and dozens more.

Differences from Data Banks

In the original data bank post:

  1. We thought data was only needed by  AI/deep learning. That seems naive now. The Biobank shows that AI/deep learning is not the only application/research that needs data.
  2. We thought data would be collected by only by hyper-scalars and other big web firms during normal user web activity. But their data is not the only data that matters.
  3. We thought data would be gathered for free. Good data can take many forms, and some may cost money.
  4. We thought profits from selling data would be split between the bank and users and could fund data bank operations. But in the Biobank, funding came from charitable contributions and data is available for free (to valid researchers).

Data banks can be an invaluable resource and may take many forms. Data that’s difficult to find can be gathered by charities and others that use funding to create, operate and gather the specific information needed for targeted research.

Comments?

Photo Credit(s): Bank on it by Alan Levine

Latest MRI – two screws in the kneecap by Becky Stern

Other graphics from the Genetics of brain structure… paper

 

 

New website monetization approaches

Historically, websites have made money by selling wares, services or advertising. In the last two weeks it seems like two new business models are starting to emerge. One more publicly supported and the other less publicly supported.

Europe’s new copyright law

According to an article I read recently (This newly approved European copyright law might break the Internet), Article 11 of Europe’s new Copyright Directive (not quite law yet) will require search engines, news aggregators and other users of Internet content to pay a “link tax” to copyright holders of anything they link to. As a long time blogger, podcaster and content provider, I find this new copyright policy very intriguing.

The article proposes that this will bankrupt small publishers as larger ones will charge less for the traffic. But presently, I get nothing for links to my content. And, I’d be delighted to get any amount – in fact I’d match any large publishers link tax amount that the market demands.

But my main concern is the impact this might have on site traffic. If aggregators pay a link tax, why would they want to use content that charges any tax. Yes at some point aggregators need content. But there are many websites full of content, certainly there would be some willing to forego tax fees for more traffic.

I also happen to be a copyright user. Most of my blog posts are from articles I read on the web. I usually link to an article in the 1st one or two paragraphs (see above and below) of a post and may refer (and link) to more that go deeper into a subject. Will I have to pay a link tax to the content owner?

How much of a link tax is anyone’s guess. I’m not sure it would amount to much. But a link tax, if done judiciously might even raise the quality of the content on the web.

Browser’s of the world, lay down your blockchains

The second article was a recent research paper (Digging into browser based crypto mining). Researchers at RWTH Aachen University had developed a new method to associate mined blocks to mining pools as a way to unearth browser-based mined crypto coins. With this technique they estimated that 1.8% of all Monero coins were mined by CoinHive using participant browsers to mine the coin or ~$250K/month from browser mining.

I see this as steeling compute power. But with that much coin being generated, it might be a reasonable way for an honest website to make some cash from people browsing their web pages. The browsing party would need to be informed of the mining operation in the page’s information, sort of like “we use cookies” today.

Just think, someone creates a WP plugin to do ETH mining and when activated, a WP website pops up a message that says “We mine coins while you browse – OK?”.

In another twist perhaps the websites could share the ETH mined on their browser with the person doing the browsing, similar to airline/hotel travel awards. Today most travel is done on corporate dime, but awards go to the person doing the traveling. Similarly, employees could browse using corporate computers but they would keep a portion of the ETH that’s mined while they browse away… Sounds like a deal.

Other monetization approaches

We’ve tried Google AdSense and other advertising but it only generated pennies a month. So, it wasn’t worth it.

We also sell research and occasionally someone buys some (see SCI Research Shop). And I do sell services but not through my website.

~~~

Not sure a link tax will fly. It would be a race to the bottom and anyone that charged a tax would suffer from less links until they decided to charge a $0 link tax.

Maybe if every link had a tax associated with it, whether the site owner wanted it or not there could be a level playing field. Recording, paying/receiving and accounting for all these link tax micro payments would be another nightmare altogether.

But a WP plugin, that announces and mines crypto coins with a user’s approval and splits the profit with them might work. Corporate wouldn’t like it but employees would just be browsing websites, where’s the harm in that.

Browse a website and share the mined crypto coin with site owner. Sounds fine to me.

Photo Credit(s): Strasburg – European Parliament|Giorgio Barlocco

Crypto News Daily – Telegram cancels ICO…

Photo of Bitcoin, Etherium and Litecoin|QuoteInspector

Photonic or Optical FPGAs on the horizon

Read an article this past week (Toward an optical FPGA – programable silicon photonics circuits) on a new technology that could underpin optical  FPGAs. The technology is based on implantable wave guides and uses silicon on insulator technology which is compatible with current chip fabrication.

How does the Optical FPGA work

Their Optical FPGA is based on an eraseable direct coupler (DC) built using GE (Germanium) ion implantation. A DC is used when two optical waveguides are placed close enough together such that optical energy (photons) on one wave guide is switched over to the other, nearby wave guide.

As can be seen in the figure, the red (eraseable, implantable) and blue (conventional) wave guides are fabricated on the FPGA. The red wave guide performs the function of DC between the two conventional wave guides. The diagram shows both a single stage and a dual stage DC.

By using imlantable (eraseable) DCs, one can change the path of a photonic circuit by just erasing the implantable wave guide(s).

The GE ion implantable wave guides are erased by passing a laser over it and thus annealing (melting) it.

Once erased, the implantable wave guide DC no longer works. The chart on the left of the figure above shows how long the implantable wave guide needs to be to work. As shown above once erased to be shorter than 4-5µm, it no longer acts as a DC.

It’s not clear how one directs the laser to the proper place on the Optical FPGA to anneal the implantable wave guide but that’s a question of servos and mirrors.

Previous attempts at optical FPGAs, required applying continuing voltage to maintain the switched photonics circuits. Once voltage was withdrawn the photonics reverted back to original configuration.

But once an implantable wave guide is erased (annealed) in their approach, the changes to the Optical FPGA are permanent.

FPGAs today

Electronic FPGAs have never gone out of favor with customers doing hardware innovation. By supplying Optical FPGAs, the techniques in the paper would allow for much more photonics innovation as well.

Optics are primarily used in communications and storage (CD-DVDs) today. But quantum computing could potentially use photonics and there’s been talk of a 100% optical computer for a long time. As more and more photonics circuitry comes online, the need for an optical FPGA grows. The fact that it’s able to be grown on today’s fab lines makes it even more appealing.

But an FPGA is more than just directional control over (electronic or photonic) energy. One needs to have other circuitry in place on the FPGA for it to do work.

For example, if this were an electronic FPGA, gates, adders, muxes, etc. would all be somewhere on the FPGA

However, once having placed additional optical componentry on the FPGA, photonic directional control would be the glue that makes the Optical FPGA programmable.

Comments?

Photo Credit(s): All photos from Toward an optical FPGA – programable silicon photonics circuits paper

 

Stanford Data Lab students hit the ground running…

Read an article (Students confront the messiness of data) today about Stanford’s Data Lab  and how their students are trained to cleanup and analyze real world data.

The Data Lab teaches two courses the Data Challenge Lab course and the Data Impact Lab course. The Challenge Lab is an introductory course in data gathering, cleanup and analysis. The Impact Lab is where advanced students tackle real world, high impact problems through data analysis.

Data Challenge Lab

Their Data Challenge Lab course is a 10 week course with no pre-requisites that teaches students how to analyze real world data to solve problems.

Their are no lectures. You’re given project datasets and the tools to manipulate, visualize and analyze the data. Your goal is to master the tools, cleanup the data and gather insights from the data. Professors are there to provide one on one help so you can step through the data provided and understand how to use the tools.

In the information provided on their website there were no references and no information about the specific tools used in the Data Challenge Lab to manipulate, visualize and analyze the data. From an outsiders’ viewpoint it would be great to have a list of references or even websites describing the tools being used and maybe the datasets that are accessed.

Data Impact Lab

The Data Impact lab course is an independent study course, whose only pre-req is the Data Challenge Lab.

Here students are joined into interdisplinary teams with practitioner partners to tackle ongoing, real world problems with their new data analysis capabilities.

There is no set time frame for the course and it is a non-credit activity. But here students help to solve real world problems.

Current projects in the Impact lab include:

  • The California Poverty Project  to create an interactive map of poverty in California to supply geographic guidance to aid agencies helping the poor
  • The Zambia Malaria Project to create an interactive map of malarial infestation to help NGOs and other agencies target remediation activity.

Previous Impact Lab projects include: the Poverty Alleviation Project to provide a multi-dimensional index of poverty status for areas in Kenya so that NGOs can use these maps to target randomized experiments in poverty eradication and the Data Journalism Project to bring data analysis tools to breaking stories and other journalistic endeavors.

~~~~

Courses like these should be much more widely available. It’s almost the analog to the scientific method, only for the 21st century.

Science has gotten to a point, these days, where data analysis is a core discipline that everyone should know how to do. Maybe it doesn’t have to involve Hadoop but rudimentary data analysis, manipulation, and visualization needs to be in everyone’s tool box.

Data 101 anyone?

Photo Credit(s): Big_Data_Prob | KamiPhuc;

Southbound traffic speeds on Masonic avenue on different dates | Eric Fisher;

Unlucky Haiti (1981-2010) | Jer Thorp;

Bristol Cycling Level by Wards 2011 | Sam Saunders

A new way to compute

I read an article the other day on using using random pulses rather than digital numbers to compute with, see Computing with random pulses promises to simplify circuitry and save power, in IEEE Spectrum. Essentially they encode a number as a probability in a random string of bits and then use simple logic to compute with. This approach was invented in the early days of digital logic and was called stochastic computing.

Stochastic numbers?

It’s pretty easy to understand how such logic can work for fractions. For example to represent 1/4, you would construct a bit stream that had one out of every four bits, on average, as a 1 and the rest 0’s. This could easily be a random string of bits which have an average of 1 out of every 4 bits as a one.

A nice result of such a numerical representation is that it easily results in more precision as you increase the length of the bit stream. The paper calls this progressive precision.

Progressive precision helps stochastic computing be more fault tolerant than standard digital logic. That is, if the string has one bit changed it’s not going to make that much of a difference from the original string and computing with an erroneous number like this will probably result in similar results to the correct number.  To have anything like this in digital computation requires parity bits, ECC, CRC and other error correction mechanisms and the logic required to implement these is extensive.

Stochastic computing

2 bit multiplier

Another advantage of stochastic computation and using a probability  rather than binary (or decimal) digital representation, is that most arithmetic functions are much simpler to implement.

 

They discuss two examples in the original paper:

  • AND gate

    Multiplication – Multiplying two probabilistic bit streams together is as simple as ANDing the two strings.

  • 2 input stream multiplexer

    Addition – Adding two probabilistic bit strings together just requires a multiplexer, but you end up with a bit string that is the sum of the two divided by two.

What about other numbers?

I see a couple of problems with stochastic computing:,

  • How do you represent  an irrational number, such as the square root of 2;
  • How do you represent integers or for that matter any value greater than 1.0 in a probabilistic bit stream; and
  • How do you represent negative values in a bit stream.

I suppose irrational numbers could be represented by taking a near-by, close approximation of the irrational number. For instance, using 1.4 for the square root of two, or 1.41, or 1.414, …. And this way you could get whatever (progressive) precision that was needed.

As for integers greater than 1.0, perhaps they could use a floating point representation, with two defined bit strings, one representing the mantissa (fractional part) and the other an exponent. We would assume that the exponent rather than being a probability from 0..1.0, would be inverted and represent 1.0…∞.

Negative numbers are a different problem. One way to supply negative numbers is to use something akin to complemetary representation. For example, rather than the probabilistic bit stream representing 0.0 to 1.0 have it represent -0.5 to 0.5. Then progressive precision would work for negative numbers as well a positive numbers.

One major downside to stochastic numbers and computation is that high precision arithmetic is very difficult to achieve.  To perform 32 bit precision arithmetic would require a bit streams that were  2³² bits long. 64 bit precision would require streams that were  2**64th bits long.

Good uses for stochastic computing

One advantage of simplified logic used in stochastic computing is it needs a lot less power to compute. One example in the paper they use for stochastic computers is as a retinal sensor for in the body visual augmentation. They developed a neural net that did edge detection that used a stochastic front end to simplify the logic and cut down on power requirements.

Other areas where stochastic computing might help is for IoT applications. There’s been a lot of interest in IoT sensors being embedded in streets, parking lots, buildings, bridges, trucks, cars etc. Most have a need to perform a modest amount of edge computing and then send information up to the cloud or some edge consolidator intermediate

Many of these embedded devices lack access to power, so they will need to make do with whatever they can find.  One approach is to siphon power from ambient radio (see this  Electricity harvesting… article), temperature differences (see this MIT … power from daily temperature swings article), footsteps (see Pavegen) or other mechanisms.

The other use for stochastic computing is to mimic the brain. It appears that the brain encodes information in pulses of electric potential. Computation in the brain happens across exhibitory and inhibitory circuits that all seem to interact together.  Stochastic computing might be an effective way, low power way to simulate the brain at a much finer granularity than what’s available today using standard digital computation.

~~~~

Not sure it’s all there yet, but there’s definitely some advantages to stochastic computing. I could see it being especially useful for in body sensors and many IoT devices.

Comments?

Photo Credit(s):  The logic of random pulses

2 bit by 2 bit multiplier, By Sodaboy1138 (talk) (Uploads) – Own work, CC BY-SA 3.0, wikimedia

AND ANSI Labelled, By Inductiveload – Own work, Public Domain, wikimedia

2 Input multiplexor

A battery free implantable neural sensor, MIT Technology Review article

Integrating neural signal and embedded system for controlling a small motor, an IntechOpen article

Blockchain, open source and trusted data lead to better SDG impacts

Read an article today in Bitcoin magazine IXO Foundation: A blockchain based response to UN call for [better] data which discusses how the UN can use blockchains to improve their development projects.

The UN introduced the 17 Global Goals for Sustainable Development (SDG) to be achieved in the world by 2030. The previous 8 Millennial Development Goals (MDG) expire this year.

Although significant progress has been made on the MDGs, one ongoing determent to  MDG attainment has been that progress has been very uneven, “with the poorest and economically disadvantaged often bypassed”.  (See WEF, What are Sustainable Development Goals).

Throughout the UN 17 SDG, the underlying objective is to end global poverty  in a sustainable way.

Impact claims

In the past organizations performing services for the UN under the MDG mandate, indicated they were performing work toward the goals by stating, for example, that they planted 1K acres of trees, taught 2K underage children or distributed 20 tons of food aid.

The problem with such organizational claims is they were left mostly unverified. So the UN, NGOs and other charities funding these projects were dependent on trusting the delivering organization to tell the truth about what they were doing on the ground.

However, impact claims such as these can be independently validated and by doing so the UN and other funding agencies can determine if their money is being spent properly.

Proving impact

Proofs of Impact Claims can be done by an automated bot, an independent evaluator or some combination of the two . For instance, a bot could be used to analyze periodic satellite imagery to determine whether 1K acres of trees were actually planted or not; an independent evaluator can determine if 2K students are attending class or not, and both bots and evaluators can determine if 20 tons of food aid has been distributed or not.

Such Proofs of Impact Claims then become a important check on what organizations performing services are actually doing.  With over $1T spent every year on UN’s SDG activities, understanding which organizations actually perform the work and which don’t is a major step towards optimizing the SDG process. But for Impact Claims and Proofs of Impact Claims to provide such feedback but they must be adequately traced back to identified parties, certified as trustworthy and be widely available.

The ixo Foundation

The ixo Foundation is using open source, smart contract blockchains, personalized data privacy, and other technologies in the ixo Protocol for UN and other organizations to use to manage and provide trustworthy data on SDG projects from start to completion.

Trustworthy data seems a great application for blockchain technology. Blockchains have a number of features used to create trusted data:

  1. Any impact claim and proofs of impacts become inherently immutable, once entered into a blockchain.
  2. All parties to a project, funders, services and evaluators can be clearly identified and traced using the blockchain public key infrastructure.
  3. Any data can be stored in a blockchain. So, any satellite imagery used, the automated analysis bot/program used, as well as any derived analysis result could all be stored in an intelligent blockchain.
  4. Blockchain data is inherently widely available and distributed, in fact, blockchain data needs to be widely distributed in order to work properly.

 

The ixo Protocol

The ixo Protocol is a method to manage (SDG) Impact projects. It starts with 3 main participants: funding agencies, service agents and evaluation agents.

  • Funding agencies create and digitally sign new Impact Projects with pre-defined criteria to identify appropriate service  agencies which can do the work of the project and evaluation agencies which can evaluate the work being performed. Funding agencies also identify Impact Claim Template(s) for the project which identify standard ways to assess whether the project is being performed properly used by service agencies doing the work. Funding agencies also specify the evaluation criteria used by evaluation agencies to validate claims.
  • Service agencies select among the open Impact Projects whichever ones they want to perform.  As the service agencies perform the work, impact claims are created according to templates defined by funders, digitally signed, recorded and collected into an Impact Claim Set underthe IXO protocol.  For example Impact Claims could be barcode scans off of food being distributed which are digitally signed by the servicing agent and agency. Impact claims can be constructed to not hold personal identification data but still cryptographically identify the appropriate parties performing the work.
  • Evaluation agencies then take the impact claim set and perform the  evaluation process as specified by funding agencies. The evaluation insures that the Impact Claims reflect that the work is being done correctly and that the Impact Project is being executed properly. Impact claim evaluations are also digitally signed by the evaluation agency and agent(s), recorded and widely distributed.

The Impact Project definition, Impact Claim Templates, Impact Claim sets, Impact Claim Evaluations are all available worldwide, in an Global Impact Ledger and accessible to any and all funding agencies, service agencies and evaluation agencies.  At project completion, funding agencies should now have a granular record of all claims made by service agency’s agents for the project and what the evaluation agency says was actually done or not.

Such information can then be used to guide the next round of Impact Project awards to further advance the UN SDGs.

Ambly project

The Ambly Project is using the ixo Protocol to supply childhood education to underprivileged children in South Africa.

It combines mobile apps with blockchain smart contracts to replace an existing paper based school attendance system.

The mobile app is used to record attendance each day which creates an impact claim which can then be validated by evaluators to insure children are being educated and properly attending class.

~~~

Blockchains have the potential to revolutionize financial services, provide supply chain provenance (e.g., diamonds with Blockchains at IBM), validate company to company contracts (Ethereum enters the enterprise) and now improve UN SDG attainment.

Welcome to the new blockchain world.

Photo Credit(s): What are Sustainable Development Goals, World Economic Forum;

IXO Foundation website

Ambly Project webpage

Crowdresearch, crowdsourced academic research

Read an article in Stanford Research, Crowdsourced research gives experience to global participants that discussed an activity in Stanford and other top tier research institutions to try to get global participation in academic research. The process is discussed more fully in a scientific paper (PDF here) by researchers from Stanford, MIT Media Lab, Cornell Tech and UC Santa Cruz.

They chose three projects:

  • A HCI (human computer interaction) project to design, engineer and build a new paid crowd sourcing marketplace (like Amazon’s Mechanical Turk).
  • A visual image recognition project to improve on current visual classification techniques/algorithms.
  • A data science project to design and build the world’s largest wisdom of the crowds experiment.

Why crowdsource academic research?

The intent of crowdsourced research is to provide top tier academic research experience to persons which have no access to top research organizations.

Participating universities obtain more technically diverse researchers, larger research teams, larger research projects, and a geographically dispersed research community.

Collaborators win valuable academic research experience, research community contacts, and potential authorship of research papers as well as potential recommendation letters (for future work or academic placement),

How does crowdresearch work?

It’s almost an open source and agile development applied to academic research. The work week starts with the principal investigator (PI) and research assistants (RAs) going over last week’s milestone deliveries to see which to pursue further next week. The crowdresearch uses a REDDIT like posting and up/down voting to determine which milestone deliverables are most important. The PI and RAs review this prioritized list to select a few to continue to investigate over the next week.

The PI holds an hour long video conference (using Google Hangouts On Air Youtube live stream service). On the conference call all collaborators can view the stream but only a select few are on camera. The PI and the researchers responsible for the important milestone research of the past week discuss their findings and the rest of the collaborators on the team can participate over Slack. The video conference is archived and available  to be watched offline.

At the end of the meeting, the PI identifies next weeks milestones and potentially directly responsible investigators (DRIs) to work on them.

The DRIs and other collaborators choose how to apportion the work for the next week and work commences. Collaboration can be fostered and monitored via Slack and if necessary, more Google live stream meetings.

If collaborators need help understanding some technology, technique, or too, the PI, RAs or DRIs can provide a mini video course on the topic or can point to other information used to get the researchers up to speed. Collaborators can ask questions and receive answers through Slack.

When it’s time to write the paper, they used Google Docs with change tracking to manage the writing process.

The team also maintained a Wiki on the overall project to help new and current members get up to speed on what’s going on. The Wiki would also list the week’s milestones, video archives, project history/information, milestone deliverables, etc.

At the end of the week, researchers and DRIs would supply a mini post to describe their work and link to their milestone deliverables so that everyone could review their results.

Who gets credit for crowdresearch?

Each week, everyone on the project is allocated 100 credits and apportions these credits to other participants the weeks activities. The credits are  used to drive a page-rank credit assignment algorithm to determine an aggregate credit score for each researcher on the project.

Check out the paper linked above for more information on the credit algorithm. They tried to defeat (credit) link rings and other obvious approaches to stealing credit.

At the end of the project, the PI, DRIs and RAs determine a credit clip level for paper authorship. Paper authors are listed in credit order and the remaining, non-author collaborators are listed in an acknowledgements section of the paper.

The PIs can also use the credit level to determine how much of a recommendation letter to provide for researchers

Tools for crowdresearch

The tools needed to collaborate on crowdresearch are cheap and readily available to anyone.

  • Google Docs, Hangouts, Gmail are all freely available, although you may need to purchase more Drive space to host the work on the project.
  • Wiki software is freely available as well from multiple sources including Wikipedia (MediaWiki).
  • Slack is readily available for a low cost, but other open source alternatives exist, if that’s a problem.
  • Github code repository is also readily available for a reasonable cost but  there may be alternatives that use Google Drive storage for the repo.
  • Web hosting is needed to host the online Wiki, media and other assets.

Initial projects were chosen in computer science, so outside of the above tools, they could depend on open source. Other projects will need to consider how much experimental apparatus, how to fund these apparatus purchases, and how a global researchers can best make use of these.

My crowdresearch projects

Some potential commercial crowdresearch projects where we could use aggregate credit score and perhaps other measures of participation to apportion revenue, if any.

  • NVMe storage system using a light weight storage server supporting NVMe over fabric access to hybrid NVMe SSD – capacity disk storage.
  • Proof of Stake (PoS) Ethereum pooling software using Linux servers to create a pool for PoS ETH mining.
  • Bipedal, dual armed, dual handed, five-fingered assisted care robot to supply assistance and care to elders and disabled people throughout the world.

Non-commercial projects, where we would use aggregate credit score to apportion attribution and any potential remuneration.

  • A fully (100%?) mechanical rover able to survive, rove around, perform  scientific analysis, receive/transmit data and possibly, effect repairs from within extreme environments such as the surface of Venus, Jupiter and Chernoble/Fukishima Daiichi reactor cores.
  • Zero propellent interplanetary tug able to rapidly transport rovers, satellites, probes, etc. to any place within the solar system and deploy theme properly.
  • A Venusian manned base habitat including the design, build process and ongoing support for the initial habitat and any expansion over time, such that the habitat can last 25 years.

Any collaborators across the world, interested in collaborating on any of these projects, do let me know, here via comments. Please supply some way to contact you and any skills you’re interested in developing or already have that can help the project(s).

I would be glad to take on PI role for the most popular project(s), if I get sufficient response (no idea what this would be). And  I’d be happy to purchase the Drive, GitHub, Slack and web hosting accounts needed to startup and continue to fruition the most popular project(s). And if there’s any, more domain experienced PIs interested in taking any of these projects do let me know.  

Comments?

Picture Credit(s): Crowd by Espen Sundve;

Videoblogger Video Conference by Markus Sandy;

Researchers Night 2014 by Department of Computer Science, NTNU;