Saw an article from MIT News (Using machine learning to predict high-impact research) on how researchers there were able to train an AI model to predict which scientific research was going to be the most impactful (foundational) over time. The news article was reporting on research written up in a Nature article (Learning on knowledge graph dynamics provides an early warning of impactful research, behind paywall). The researchers proposed that institutions and VC should use their new DELPHI (Dynamic Early-warning by Learning to Predict High Impact [research]) tool to find foundational research to invest in.
Attempts to identify good research have been active for years. For example, CiteSeerX and other’s like them, use an articles citation index to rank research. Citation indexes are sort of like Google’s page rank and uses a count of how many citations a research paper has garnered since publication as their metric of importance.
Although citation indices are a single, easy to calculate metric, they don’t seem to be a foolproof method to identify foundational research and it takes a number of years to become evident. The researchers at MIT decided to see if using an AI model to identify high impact research would work better.
But first please take our new poll:
How DELPHI works
Apparently, DELPHI uses article metadata, such as one can find looking at the Nature article behind this research (linked to above), to create a knowledge graph. They then use the knowledge graph and an AI model to predict whether the research will become high impact or not. The threshold they used for their publication was any research DELPHI predicts would be in the top 5% of all research in a domain.
Not having access to their paper (or code, see below), we can’t determine if they used a DNN or some other AI/data analytics approach to come up with their prediction.
The input data (article metadata) came from a website, Lens.org which provides metadata for ~230M research articles and ~130M patent filings. The researchers focused on life sciences as the domain to analyze to predict impact, but presumably their approach would work on any scientific domain.
The research analyzed all scientific articles for 42 life sciences journals (listed in articles supplementary information). They used as their training set articles written prior to 2017. And then used their model to predict the impact for articles published since 2018.
In the Nature article’s supplementary information they provide a table (Table 2) which lists some of life-sciences articles since 2018 that DELPHI predicts will have high ((top 5%) impact . There’s ~50 articles listed in the table and they supply the (knowledge) Full-graph (citation) count as well as citation counts for the articles.
2nd of 3 pages for table 2 in Nature article’s supplementary information
The Nature article’s home page also list links to the researchers code and data on one of the researchers GitHub repos. When I attempted to download the trained model and sample dataset, it generated a “links had expired” error message from Dropbox . The repo readme file suggested reaching out to the researcher if this happened. We did that, but had not received any response prior to this post’s publication. .
In any case, in the GitHub repository, there are a sample Jupiter notebook and dockerfile used to create a container to run the notebook in. The data they supplied, supposedly is a sample of 206 articles (metadata) and the notebook uses their model to predicts the impact level for those sample articles .
I would have liked to see more information on their model layer structure, hyper-parameters and other model information as well as prediction reliability statistics. But perhaps this is outlined in the Nature article or provided in the model download.
But the approach seems sound enough and even if the researchers didn’t use a DNN, it would easily lend itself to a DNN prediction, assuming you could :
1. Algorithmically create the knowledge graph from article metadata,
2. Digitize and quantify the metadata knowledge graph for all the articles, and
3. Had an independent assessment of impact levels for all research in the training set.
~~~~
Now if we could just do this for blog posts and podcasts it might be even more useful (for us).
Read an article this past week in Nature about the need for Cooperative AI (Cooperative AI: machines must learn to find common ground) which supplies the best view I’ve seen as to a direction research needs to go to develop a more beneficial and benign AI-AGI.
The Nature article puts into perspective what we all want from future AI (or AGI). That is,
AI-AI cooperation: AI systems that cooperate with one another while at the same time understand that not all activities are zero sum competitions (like chess, go, Atari games) but rather most activities, within the human sphere, are cooperative activities where one agent has a set of goals and a different agent has another set of goals, some of which overlap while others are in conflict. Sport games like soccer lacrosse come to mind. But there are other card and (Risk & Diplomacy) board games that use cooperating parties, with diverse goals to achieve common ends.
AI-Human cooperation: AI systems that cooperate with humans to achieve common goals. Here too, most humans have their own sets of goals, some of which may be in conflict with the AI systems goals. However, all humans have a shared set of goals, preservation of life comes to mind. It’s in this arena where the challenges are most acute for AI systems. Divining human and their own system underlying goals and motivations is not simple. And of course giving priority to the “right” goals when they compete or are in conflict will be an increasingly difficult task to accomplish, given todays human diversity.
Human-Human cooperation: Here it gets pretty interesting, but the paper seems to say that any future AI system should be designed to enhance human-human interaction, not deter or interfere with it. One can see the challenge of disinformation today and how wonderful it would be to have some AI agent that could filter all this and present a proper picture of our world. But, humans have different goals and trying to figure out what they are and which are common and thereby something to be enhanced will be an ongoing challenge.
The problem with today’s AI research is that its all about improving specific activities (image recognition, language understanding, recommendation engines, etc) but all are point solutions and none (if any) are focused on cooperation.
Tit for tat wins the award
To that end, the authors of the paper call for a new direction one that attempts to imbue AI systems with social intelligence and cooperative intelligence to work well in the broader, human dominated world that lies ahead.
In the Nature article they mentioned a 1984 book by Richard Axelrod, The Evolution of Cooperation. Perhaps, the last great research on cooperation that was ever produced.
In this book it talked about a world full of simulated prisoner dilemma actors that interacted, one with another, at random.
The experimenters programmed some agents to always do the proper thing for their current partner, some to always do the wrong thing to their partner, others to do right once than wrong from that point forward, etc. The experimenters tried every sort of cooperation policy they could think of.
Each agent in an interaction would get some number of points for an interaction. For example, if both did the right thing they would each get 3 points, if one did wrong, the sucker would get 1 and the bad actor would get 4, both did wrong each got 1 point, etc.
The agents that had the best score during a run (of 1000s of random pairings/interactions) would multiply for the the next run and the agents that did worse would disappear over time in the population of agents in simulated worlds.
The optimal strategy that emerged from these experiments was
Do the right thing once with every new partner, and
From that point forward tit for tat (if the other party did right the last time, then you do right thing the next time you interact with them, if they did wrong the last time, then you do wrong the next time you interact with them).
It was mind boggling at the time to realize that such a simple strategy could be so effective/sustainable in simulation and perhaps in the real world. It turns out that in a (simulated) world of bad agents, there would be this group of Tit for Tat agents that would build up, defend itself and expand over time to succeed.
That was the state of the art in cooperation research back then (1984). I’ve not seen anything similar to this since.
I haven’t seen anything like this that discusses how to implement algorithms in support of social intelligence.
~~~~
The authors of the Nature article believe it’s once again time to start researching cooperation techniques and start researching social intelligence so we can instill proper cooperation and social intelligence technology into future AI (AGI) systems .
Perhaps if we can do this, we may create a better AI (or AGI) so that both it and we can live better in our world, galaxy and universe.
Read an article the other day about a new book (The myth of AI, by Erik. J. Larson) that explains how the present direction of AI-ML-DL will be very unlikely to achieve artificial general intelligence (AGI) given it’s current direction. Amazon and others offer a short preview of the book which is where most of this discussion comes from.
Types of (human) reasoning
Near as I can tell, (don’t have the book), the book discusses the three types of reasoning that exist in human intellect, i.e., deduction, induction and abduction.
Deduction uses formal logic (or its equivalents) to derive facts or theorems from basic principles.
Induction uses a multitude of samples and constructs general principles from the analysis of them
Abduction uses a set of probabilistic assertions and formal logic, to come up with a probabilistic principle.
Deduction is most famously observed in geometry and arithmetic proofs and was most evident in the early years of AI through its use of expert systems. The challenge with expert systems is that the real world is vastly more complex than any geometrical or arithmetical artifice that humankind can produce.
Expert systems became champions of checkers, chess and some other games but in the end was not easily generalizable beyond a few (gaming and medically) restricted domains.
Induction is presently all the rage and represents what machine learning and deep neural networks (DNN) are doing with all that training data and resultant classification inferencing.
Today we have DNNs that can classify the objects in an image, can learn to play any game on the planet better than humans, and can even safely drive a car down the road.
The current AI world view is that this form of reasoning, DNN induction, will if taken to its extreme will ultimately result in some level of AGI, or human-equivalent levels of intelligence in a system. The author of the book begs to differ.
Abduction is less well known or discussed in rational circles. It’s essentially what any human does when presented with real world examples/experiences to derive an understanding (or principe) of what happened.
For example, a plate full of cookies last night becomes an almost empty plate of crumbs and two cookies. So what happened, your son woke up early, consumed most if not all of them, and left for work. This is a probabilistic (most likely) inference, but has a high probability of being true.
Any AGI will need all forms of reasoning
The challenge is that AI has been through the deduction phase through the rise of expert systems which crashed and burned because of the cost and time required to produce an exhaustive and correct expert system. And AI is currently in the induction phase, via DNN training, which seems to be entirely more generalizable and successfully usable in many different domains, but no one is talking seriously about doing abduction in AI (anymore).
The author claims (again, have not read the book) that any AGI will require as much abduction as induction (as well as perhaps deduction), and therefore, AGI is not inevitable based on our current AI DNN (or induction) intensive path.
Previous and current attempts at abduction reasoning
Some may recall fuzzy logic as one of the avenues taken after expert systems seemed to fail at doing successful and realistic inferencing around the end of last century. Fuzzy logic was a way of bring probabilities into deduction, not unlike abduction as defined above. With fuzzy logic each assertion or base assumption was given a probabilistic value (of being true) and the final derivation was assigned some level of probability of being true.
The wikipedia article has definitions for fuzzy logic and, or and not which of course would allow any system to make these assertions. But fuzzy logic (like expert systems above) suffered from the inability to exhaustively cover all examples in a real world situation.
Furthermore, the (funny) thing about DNNs is that they are much more probabilistic than it appears. If one examines classification outputs of any DNN, it is extremely rare to see some sort of boolean (true or false) yes or no answers. Mostly one sees a series of probabilities that are assigned to each classification bucket.
DNN systems hide these probabilities by just selecting the maximum (or minimum) probability generated as its final classification. This is entirely an artifact of needing to have some discrete output (classification selection). But DNN (internal) results always result in probabilistic values.
So although, pure induction doesn’t include probabilities, DNN induction as practiced today in AI systems, uses probabilistic reasoning in every layer of a DNN and in its final results.
What else may be missing from AI to allow AGI to be developed
Personally, AGI seems to require not just the reasoning approaches above, but a more workable and general purpose planning solution. I’ve tried to identify to see whether some researchers are using DNNs to provide general purpose planning solutions but have been yet to find any (in publcly available research). These are probably the one place where expert (or control) fuzzy systems still shine. But again they are hard to generalize and prove almost impossible to be completely exhaustive.
Nonetheless, in the end, I think that all the above just proves, that there are a number of distinct reasoning and other (planning) techniques that may need to come together to provide AGI. As any of us can attest, all of these different approaches are available within any human intellect.
And if we assume that any AGI will need to follow the human design to intelligence (not a given), they will all need to be stitched together, combined and brought to bear to realize AGI.
But, at present, with all the focus on DNN/induction, we, as AI researchers, are not making any progress on using these other techniques or in combining them into a single system.
And for that I am happy. I would be very pleased to have any AGI be farther out than nearer term. Because for the life of me, AGI scares the s&#t out of me.
Mostly because I don’t see any real way to control AGI, once unleashed. That and given the diversity of motives around this world, I don’t see any realistic mechanism to instill a universal and firm (unalterable) belief in the sanctity of human and other life, the dependance this life has on our environment/biosphere and the rule of law needed to maintain peace across humankind (and I’m probably missing a half dozen more things that we would want any AGI to adhere to).
Maybe, if I saw more effort on how, we as a species can come up with universal views on these and other topics and can come up with some way of instilling, essentially a system of programs, with these unalterable beliefs and AGI controls based on these, I’d be less fearful of AGI emerging.
Lacking that, any way of delaying its emergence, is fine by me.
Researchers at UCLA have taken a trained DL neural network and implemented it into a series of passive optical only, 3D printed diffraction gratings to perform fashion MNIST object classification. And did the same with a MNIST handwritten digit and ImageNet DL neural network classifiers.
But first please take our new poll:
Experimental testing of 3D-printed D2NNs.(A and B) After the training phase, the final designs of five different layers (L1, L2, …, L5) of the handwritten digit classifier, fashion product classifier, and the imager D2NNs are shown. To the right of the network layers, an illustration of the corresponding 3D-printed D2NN is shown. (C and D) Schematic (C) and photo (D) of the experimental terahertz setup. An amplifier-multiplier chain was used to generate continuous-wave radiation at 0.4 THz, and a mixer-amplifier-multiplier chain was used for the detection at the output plane of the network. RF, radio frequency; f, frequency.
Apparently the researchers trained a normal (electronic based) deep learning neural network on the MNIST, Fashion MNIST and ImageNet and then converted the resultant trained NNs into a set of multiple diffraction grids. They did some computer simulation of the D2NN and once satisfied it worked and achieved decent accuracy, 3D printed the diffraction plates.
All-optical D2NN-based classifiers. These D2NN designs were based on spatially and temporally coherent illumination and linear optical materials/layers. (a) D2NN setup for the task of classification of handwritten digits (MNIST), where the input information is encoded in the amplitude channel of the input plane. (b) Final design of a 5-layer, phase-only classifier for handwritten digits. (c) Amplitude distribution at the input plane for a test sample (digit ‘0’). (d-e) Intensity patterns at the output plane for the input in (c); (d) is for MSE-based, and (e) is softmax- cross-entropy (SCE)-based designs. (f) D2NN setup for the task of classification of fashion products (Fashion-MNIST), where the input information is encoded in the phase channel of the input plane. (g) Same as (b), except for fashion product dataset. (h) Phase distribution at the input plane for a test sample. (i-j) Same as (d) and (e) for the input in (h), refers to the illumination source wavelength. Input plane represents the plane of the input object or its data, which can also be generated by another optical imaging system or a lens, projecting an image of the object data onto this plane.
In their D2NN, they start with coherent (laser) light in the THz spectrum, used this to illuminate the input plane (I assume an image of the object/digit/fashion accessory) and passed this through multiple plates of diffraction grids onto THz detector which was used to detect the illuminated spot that indicated the classification.
The article in science has a supplementary materials download that show how the researchers converted NN weights into a diffraction grating. Essentially each pixel on the diffraction grating either transmits, refracts, or reflects a light path. And this represents the connections between layers. It’s unclear whether the 5 or 6 plates used in the D2NN correspond to the NN layers but it’s certainly possible.
And to the life of me I can’t understand what they mean by “Residual D2NN”, other than if it means using a trained (residual) NN and converting this to D2NN.
Some advantages of D2NN
3D printing diffraction gratings means anyone/lab could do this. The 3D printers they used had a spatial accuracy of 600 dpi, with 0.1mm accuracy, almost consumer grade 3D printers. In any case, being able to print these in a matter of hours, while not as easy as changing an all digital NN, seems like an easy way to try out the approach.
For example, for the MNIST digit classifier they used a pixel size of 400um and each diffraction layer they created was equivalent to 200X200 neural weights. Which means that 5 layer D2NN could handle about 0.2M neural weights which were completely connected to one another. This meant they could have (200×200)**2*5=8B connections in the MNIST D2NN. In the image classifier, each diffraction layer had 300×300 neural weights. So D2NN’s seem to scale very well.
Being an all passive optical device, the system is operates entirely in parallel, That is, the researchers indicated that the D2NN devices operate at the speed of light and would perform the inferencing activity in the time it takes a camera to capture the image.
Also the device uses very little energy (I assume just the energy for the THz generator, the input plane detector and the THz detector at the end.
And the researchers also claimed the device was cheap to manufacture, it could be created for less than $50. (Unclear if this included all the electronics or just the D2NN diffraction gratings and holder). And once you have locked into a D2NN that you wanted to use, could be manufactured in volume, very cheaply (sort of like stamping out CD platters). Finally, the number of neural network nodes and layers can be scaled up to a large number of layers and nodes per layer while still fitting on the diffraction gratings. In contrast, all electronic NN require more compute power as you scale up network layers and nodes per layer.
The other article (ArchivX) talked about potentially using a hybrid optical-electronic DNN approach with some layers being D2NN and others being purely digital (electronics). Such a system could potentially be used where some portion of the NN was more stable/more compute intensive than others and where the final output classification layer(s) was more changeable and much smaller/less compute intensive. Such a hybrid system could make use of the best of of the all optical D2NN to efficiently and quickly compress the input space and then have the electronic final classification layer provide the final classification step.
The Oracle
Combining a handful of D2NNs into a device that accepts speech input and provides speech output with the addition of say an offline copy of Wikipedia, Google Books etc. with a search engine that could be used to retrieve responses to questions asked would create an oracle device. Where you would ask a question and the device would respond with the best answer it could find (in it’s databases).
If this could be made out of an all passive optical components and use natural sunlight/electronic illumination to perform it’s functionality, such an all optical, question to answer oracle would be very useful to the populations of the world. And could be manufactured in volume very cheaply and would cost almost nothing to operate.
A couple of other tweaks, if we could collapse the multiple grating D2NNs into a single multi-layer plate/platter and make these replaceable in the device that would allow the oracle’s information base to be updated periodically.
Then if we could embed such a device into a Long Now Clock that would reflect sunlight onto the disk every Solstice, or Equinox, then we could have a quarterly oracle device that could last for 1000 of years. That would provide answers to queries one day every quarter. And that would be quite the oracle…
Essentially, all research funded by these organizations must be immediately published in open access forum, open access journals or be freely available in an open access section of a publishers website which means it could be free to be read by anyone worldwide with access to the web. Authors and institutions will retain copyright for the work and the work will be published under an open access license such as the CC BY (Creative Commons Attribution) license.
Why open access is important
At this blog, frequently we find ourselves writing about research which is only available on a paid subscription or on a pay per article basis. However, sometimes, if we search long enough, we find a duplicate of the article published in pre-print form in some preprint server or open access journal.
We have written about open access journals before (see our New Science combats Coronavirus post). Much of what we do on this blog would not be possible without open access journals like PLoS, BioRxiv, and PubMed.
Open access mandates are trending
Open access mandates have been around for a while now. And even the US Gov’t got into the act, mandating all research funded by the NIH be open access by 2008, with Dept of Agriculture and Energy following later (see wikipedia Open access mandates).
In addition, given the pandemic emergency, many research publishers like Nature and Elsevier made any and all information about the Coronavirus free access on their websites.
Impacts and R&D research publishing business model
Although research is funded by public organizations such as charities and government agencies, prior to open access mandates, most research was published in peer-reviewed journal magazines which charged a fee for access. For many research organizations, those fees were a cost of doing research. If you were an independent researcher or in an institution that couldn’t afford these fees, attempting to do cutting edge research was impossible without this access.
Yes in some cases, those journal repositories waved these fees for deserving institutions and organizations but this wasn’t the case for individual researchers. Or If you were truly diligent, you could request a copy of a paper from an author and wait.
Of course, journal publishers have real expenses they needed to cover, as well as make a reasonable profit. But due to business consolidation, there were fewer independent journals around and as a result, they charged bundled license fees for vast swathes of research articles. Such a wide bundle may or may not be of interest to an individual or an institution. That plus with consolidation, profits were becoming a more significant consideration.
So open access mandates, often included funding to cover fees for publishers to supply open access. Such fees varied widely. So open access mandates also began to require fees to be published and to be supplied a description how prices were calculated. By doing so, their hope was to make such costs more transparent
Impacts on authors of research articles
Somewhere there’s an aphorism for researchers that says “publish or perish“, which means you must publish research in order to become a recognized expert in your field. Recognition often the main driver behind better academic employment and more research funding.
However, it’s not just about volume of published papers, the quality of research also matters. And the more highly regarded publishing outlets have an advantage here, in that they are de facto gatekeepers to whats published in their journals. As such, where you publish can often lend credibility to any research.
Another thing changed over the last few decades, judging the quality of research has become more quantative. Nowadays, research quality is also dependent on the number of citations it receives. The more popular a publisher is, the more readers it has which increases the possibility for citations.
Thus, most researchers try to publish their best work in highly regarded journals. And of course, these journals have a high cost to provide open access.
Successful research institutions can afford to pay these prices but those further down the totem pole cannot.
Most mandates come with additional funding to support paying the cost to supply open access. But they also require publishing and justifying these. In the belief that in doing this so it will lend some transparency to these costs.
So the researcher is caught in the middle. Funding organizations want open access to research they fund. And publishers want to be paid a profit for that access.
History of research publication
Nature magazine first started publishing research in 1859, Science magazine first published in 1880, the Royal Society first published research in 1665. So publishing research has been going on for 350 years, and at least as a for profit business model, since the mid-1800s.
Research prior to being published in journals was only available in books. And more than likely, the author of the research had to pay to have a book published and the publisher made money only when those books were sold. And prior to that, scientific research was mostly only available in a course of study, also mostly paid for by the student.
So science has always had a cost to access. What open access mandates are doing is moving this cost to something added to the funding of research.
Now if open access can only solve the reproducibility crisis in science we could have us a real scientific revolution.
Read an article today about new research done to apply big data analytics against multiple cancer strains to identify key control mechanisms that allow cancer to survive in the body and multiply. The article Big data analysis find cancer’s key vulnerabilities discusses their discovery of 24 “master regulators” that are present in a number of different cancers. The original research article is in Cell (behind paywall) but I managed to find a preprint on BiorXiv.
From a (software) coding perspective, it’s almost like a majority of cancers are re-using the same modules to perform functions that are needed by the cancer cells. Not all cancers exhibit all master regulator blocks but all the cancers that they have examined have some of them.
The researchers examined the regulatory/signaling networks of proteins in 112 cancer cell lines. They identified 407 master regulatory proteins and further analysis showed that these protiens were associated with 24 master regulatory architectures (oncotectures). A decent laymen description of a cancer oncotecture can be found in an old (2016) Economist Article Cancer’s master criminals…
Master regulatory proteins
According to the Economist article master regulatory proteins are proteins that regulate processes in a cancer cell that cause other proteins to be made, which cause other proteins to be made, etc. which affect the way a cancer cell lives and propagates inside a body.
Biologists call these sorts of proteins transcription factors which controls the copying of DNA information into mRNA which are then taken to protein factories to create proteins from that blueprint.
The research team believe they24 master regulatory (MR) blocks, if they could be disabled somehow, would disrupt the cancer cell and ultimately eliminate that cancer from a body.
It’s almost like a DevOps script that automates the deployment of software inside the cloud. The fact that they have identified 24 master regulatory (MR blocks) architectures (sequences of proteins that are occur) that apply to a wide set of cancer tumor sub-types implies that these could be needed to regulate the functionality of these cancers. If drugs could be devised to interrupt, change or deactivate these master regulatory blocks it’s quite possible that these cancers would be eliminated.
Identifying MR Blocks using (Bio/Life Sciences) Big Data
It all starts with VIPER analysis (GitHub repo) that measures a specific proteins transcriptional activity level. In this fashion they were able to analyze the 112 tumor subtype proteome (the total complement of all proteins active in a cell). And whittle these down, using cluster analysis to those that were especially relevant for the cancer cell transcription activity.
They then used DIGGIT analysis (GitHub repo of R implementation) to identify the MR proteins and identify cellular mutations that led to them. The types of mutations can be copy number, single point or gene fusion. DIGGIT analysis can help identify which of the mutations are responsible for the protein being analyzed. The DIGGIT process is a multi-step, analytical approach to identifying candidate MR proteins.
Then using tumor checkpoint hypothesis and Bayesian analysis/integration they further ranked the MR candidate proteins. Tumor checkpoints are state transitions in the life of a cancer cell where the cell assesses its environment and then determines what actions to take next.
The tumor checkpoint hypothesis says that during the life cycle of a cancer cell it goes through various state transitions. The researchers have shown that these state transitions are managed by the MR blocks they have identified.
In the final step in their analysis, they used tumor checkpoint hypothesis and modularity with saturation & modularity analysis to identify top MR proteins and the MR blocks active in the 112 tumor subtypes.
At the end of their analysis, they had identified 24 MR blocks which solely or in some combination are present in each of the 112 tumor subtypes. If these MR blocks could be attacked by specific drugs then each of these 112 tumor subtypes could essentially be eliminated from a body or rather cure that cancer.
I’ve come by and purchased a number of digital assistants over the last couple of years from both Google and Amazon but not Apple. At first their novelty drove me to take advantage of them to do a number of things. But over time I started to only use them for music playing or jokes. But then I started to hear about some other concerns with the technology.
The problems with today’s vendor based, digital assistants
My and others main concern was their ability to listen into conversations in the home and workplace without being queried. Yes, there are controls on some of them to turn off the mic and thus any recordings. But these are not hardwired switches and as software may or may not work depending on the implementation. As such, there is no guarantee that they won’t still be recording audio feeds even with their mic (supposedly) turned off.
At one point I saw a news article where police had subpoenaed recordings of a digital assistant to use in a criminal case. Now I’m ok with use of this for specific, court approved, criminal cases but what’s to limit its use to such. And not all courts, or governments for that matter, are as protective of personal privacy as some.
Open source digital assistant on the way
But with an open source version of a digital assistant, one where the user had complete programmatical control over its recording and use of audio data is another matter. I suppose this doesn’t necessarily help the technically challenged among us that can’t program our way out of a paper bag but even for those individuals, the fact that an open source version exists to protect privacy, could be construed as something much more secure than a company or vendor’s product.
The main problem facing an open source digital assistant is the need for massive amounts of annotated training request data. This is one of the main reasons that commercial digital assistants often record conversations when not specifically requested.
But Stanford University who is responsible for creating the open source digital assistant above has managed to design and create a “rules based” system to help generate all the training data needed for a virtual assistant.
With all this automatically generated training data they can use it to train a digital assistant’s natural language processing neural network to understand what’s being asked and drive whatever action is being requested.
At the moment the digital assistant (and its conversation generator) has somewhat limited skills, or rather only works in a restricted set of domains such as restaurants, people, movies, books and music. For example, “identify a restaurant near me that has deep dish pizza and is rated greater than 4 on a 5 point scale”, “find me an mystery novel talking that is about magic”, or “who was the 22nd president of the USA”.
But as the digital assistant and its annotated, rules based conversation generator are both open source, anyone can contribute more skills code or add more conversational capabilities. Over time, if there’s enough participation, perhaps even someday perform all of the skills or capabilities of commercial digital assistants.
Almond’s verbal generator is called Genie and uses compositional technology to generate conversations that are used to train their linguistic user interface (LUInet). Almond also uses ThingTalk a new declaritive program language to process responses to queries and requests. Finally, Almond makes use of Thingpedia, a repository of information about internet services and IoT devices to tell it how to interact with these systems.
Stanford Genie technology
The technology behind Genie is based on using source text statements to create templates that can generate sentences for any domain you wish to have Almond work in. If one is interested in expanding the Almond domains, they can create their own templates using the Genie toolkit.
One essentially provides a small set of input sentences that are converted into templates and used by Genie to understand how to parse all similar sentences. This enables Almond to “understand” what’s being requested of it
ThingTalk is the programming language used to control what Almond can do for requests and queries. Essentially it’s a multi-part statement about what to do when a request comes along. The main parts in a ThingTalk statement include:
When a particular action is supposed to be triggered.
What service does the request need in order to perform its action.
What action is requested
The “what service does a request need” are based on Open API calls (See ThingPedia below). The “what action is requested” can either be standard Almond actions or invoke other ThingPedia open source API calls, such as create a tweet, post on FB, send email etc.
For example, a ThingTalk statement looks like:
monitor @com.foxnews.get() => @com.slack.send();
Which monitors Fox news for any new news articles and sends them (the link) to your Slack channel.
Stanford Thingpedia
Thingpedia is an open source repository of structured information available on the Web and of API services available on the web. Structured information or data is the information behind calendars, contact databases, article repositories, etc. Any of which can be queried for information and some of which can be updated or have actions performed on them. API services are the way that those queries and actions are performed.
One page of the Thingpedia multi-page summary of services that are offered
The Thingpedia web page shows a number of services that already have Open source APIs defined and registered. For example, things like twitter, facebook, bing search, BBC news, gmail and a host of other services. More are being added all the time and these represent the domains that Almond can be used to act upon.
Some of these domains are more defined that others. But in any case any service that takes the form of an web based API can be added to Thingpedia.
Thingpedia as a standalone open source repository is valuable in and of itself regardless of its use by Almond. But Almond would be impossible without Thingpedia. Thingpedia wants to be the wikipedia of APIs.
Almond, putting it all together
Almond consists of mainly the Almond Agent, Engine and Thingpedia. The Agent is used by the various Almond implementions to parse and understand the request and access the ThinkTalk program statement. Almond Agent uses its LUInet natural language interpreter, interpret that request and to select the ThingTalk program for the request. Once the ThinkTalk program is identified, it uses the various Thingpedia APIs requested by the ThinkTalk statement to generate the proper API calls to the service being requested and generate any output that is requested.
Where can you run Almond
Almond is available currently as a web app, an Android App, a Gnome (Linux) desktop/laptop App, a CLI application or can be run on your Mac or Windows computers. You could of course create your own smart speaker to run Almond or perhaps hack a current smart speaker to do so.
One important consideration is that with the Android app, all your data and credentials are only stored on the phone. And will not go out into the cloud or elsewhere. I didn’t see similar statements about privacy protections for the web app or any of the other deployments. But as Almond is open source, you potentially have much greater control over where your data resides.
~~~~
What I would really like is a smart speaker app running on a RPi with a microphones and a decent speaker attached, all in the package of a cube or cylinder.
I thought their videos on Almond were pretty cheesy but the technology is very interesting and could potentially make for an interesting competitor of today’s smar
Photo Credit(s):
All photos and graphics from Stanford Almond and OVAL Lab websites.
I read an article a couple of weeks back about an Open Source Bionic Leg, which was reporting on research began as a NSF funded project at the University of Michigan (UoM), with collaboration from Northwestern, University of Texas at Dallas and CMU. UofM has a website that provides everything you need to build your own open source leg (OSL) leg at OpenSourceLeg.com.
The challenge in human prosthetics these days is that all research is done in silos. Much of it is proprietary and only available within corporations but even university research has been hampered by the lack of a standard platform that could be used to develop new components and ideas on.
The real difficulty is defining the control logic (code). The OSL project is intended to resolve this lack of a platform by providing everything a researcher (hobbyist, or amputee) needs to build their own, at home or in the lab.
The website includes a parts lists and STEP files as well as an estimated cost ($28.5K) to build your own powered prosthetic leg. They also have a Excel spread sheet with all the parts listed, including part numbers and links to where they can be ordered (McMaster-Carr, SolidWorks, & Dephy)
They also show how to build a leg with a short youtube videoof how to assemble the whole leg as well as details for each subassembly with separate how-to videos for each.
The open source leg makes use of code from FlexSEA (Flexible Scaleable Electronics Architecture) andDephy. FlexSea was originally developed by Jean-Francois (JF) Duval while he was at MIT for his doctoral thesis. He has since joined Dephy a robotics design firm. The open source leg project uses FlexSea/Dephy code for its servo control mechanisms.
There is a GitHub Python, MatLab and C control library repo with all the code. The open source leg website also includes instructions, scripts and an image file which can be used to build your own RaspberryPi (4) controller for the leg.
The two (ankle and knee) servos are USB connected to the RPi. There are also other sensors such as the joint (servo-motor) encoders and a six axis load sensor I2C connected to the RPi. Each servo has its own 950mAh battery.
On the OSL website’s control page one can see these servos in action (with short youtube segments). They also provide instructions on how to use the open source control library to take the servo mechanisms through their paces.
Although on the OSL website’s control page I didn’t see anything which put the whole leg together to make use of it in a real world application. They did show on the Data page a youtube video with the OSL attached to a person and being used to walk up and down stairs, inclines and walking across a floor.
~~~~
Seeing as how the OSL website included STEP and PDF files for all the (machined) parts which represent $15.6K of the $28.5K, if one really wanted to do this on the cheap, one could just 3D print these parts in plastic. It would obviously not suffice mechanically for real use, but it could provide a platform for testing and developing control logic. At some point one could upgrade some or all of the plastic 3D printed parts to something more durable for use in human trials.
Another option is to purchase multiple sets of parts. The OSL website also showed price estimates for purchasing two sets of ankle and knee parts. But I’d imagine if one was so inclined, a number of researchers (hobbyists or amputees) could get together and order multiple sets of parts for reduced prices.
It’s also possible, with a lot of work, that the open source leg could be redesigned to support an open source arm-hand mechanism. This is where having 3D printed plastic parts could be extremely useful in helping to redesign the leg into an arm-hand.