What the researchers have created is an artificial protein that builds a cage around some bioactive (protein based) mechanism that can be unlocked by another protein while residing in a cell. This new protein is called LOCKR (Latcheable Orthogonal Cage-Key pRotein). LOCKR proteins act as switch activated therapeutics within a cell.
In the picture above the blue coils are the cage proteins and the yellow coil is the bioactive device (protein). Bioactive devices can be designed that degrade other proteins, can modify biological processes within the cell, initiate the cells self-destruct mechanism, etc, just about anything a protein can do within a cell.
In the second Nature paper, they discuss one example of a LOCKR protein, called degron-LOCKR which once inside a cell is used to degrade (destroy) a specific protein. The degron-LOCKR protein only activates when the other protein is active (found) within the cell and it operates only as long as that protein is in sufficient concentration.
The nice thing about the degron-LOCKR protein is that is completely self-regulating. It only operates when the protein to be degraded exists in the cell. That protein acts as the switch in this LOCKR. Until then it remains benign, waiting for a time when the targeted protein starts to be present in the cell.
How LOCKR works
In the picture above the cage is in shown by the grey structure, the bio-active therapy is shown by the yellow structure, and the protein key is shown by the black structure. When the key is introduced into the LOCKR protein, the yellow structure is unfolded (enabled) and can then impact whatever intra-cellular process/protein, it’s been designed to impact.
One key attribute to LOCKR is that the bioactive device within the cage, can be just about anything that works inside the cell. It could be used to create more proteins, less proteins, disable proteins, and perhaps enhance the activity of other proteins.
And, both the LOCKR and the bioactive device can be designed from scratch and fabricated outside or inside the cell. Of course the protein key is the other aspect of the LOCKR mechanism that is fully determined by the designer of the LOCKR protein.
Sort of reminds me of the transistor. Both are essentially switches. For transistors, as long as voltage is applied, it will allow current to flow across the switch. LOCKR does something very similar, but uses a key protein and a bioactive protein that only allows the bioactive protein to activate when the key protein is present.
We’ve talked extensively in the past about using DNA/cells as rudimentary computers and storage, but this takes that technology to a whole other level, (please see our DNA computing series here & here as well as our posts on DNA as storage here & here ). And all that work was done without LOCKR. With LOCKR much of these systems would be even easier to construct and design.
The articles go on to say that LOCKR unleashes the dawn of a new age of intra-cell therapeutics with fine grained control over when and where a particular bio-active therapy is activated within the cell
Some of these may be answered in the Nature papers (behind paywall), so sorry in advance, if you have access to those.
How the LOCKR protein(s) are introduced into cells, was not discussed in the freely available articles. We presume that DNA designed to create the LOCKR protein could be injected into cells via a virus, added to the cells DNA via CRISPR, or the LOCKR protein could just be injected into the cell.
Moreover, how LOCKR proteins are scaled up within the cell to be more or less active and scaled up throughout an organ to “fix” multiple cells is yet another question.
Adding artificial DNA or LOCKR proteins to cells may be easy in the lab, but putting such therapy into medical practice will take much time and effort. And any side effects of introducing artificial DNA or LOCKR proteins (not found in nature) to cells will need to be investigated. And finally how such protein technology impacts germ lines would need to be fully understood.
But the fact that the therapeutic process is only active when unlocked by another key protein makes for an intriguing possibility. You would need both the LOCKR protein and the key (unlock-er) protein to be present in a cell for the therapy to be active.
But they present one example, the degron-LOCKR, where the key seems to be a naturally active protein in a cell that needs to be degraded, not a different, artificial protein introduced into the cell. So the key doesn’t have to be an artificial protein and probably would not be for most LOCKR designed proteins.
Not a bad start for a new therapy. It has much potential, especially if it can be scaled easily and targeted specifically. Both of which seem doable (given our limited understanding of biological processes).
Recall that in part 1, we discussed most of the threats posed by clouds to both hardware and software IT vendors. In that post we talked about some of the more common ways that vendors are trying to head off this threat (for now).
In this post we want to talk about some uncommon ways to deal with the coming cloud apocalypse.
But first just to put the cloud threat in perspective, the IT TAM is estimated, by one major consulting firm, to be a ~$3.8T in 2019 with a growth rate of 3.7% Y/Y. The same number for public cloud spending, is ~$214B in 2019, growing by 17.5% Y/Y. If both growth rates continue (a BIG if), public cloud services spend will constitute all (~98.7%) of IT TAM in ~24 years from now. No nobody would predict those growth rates will continue but it’s pretty evident the growth trends are going the wrong way for (non-public cloud) IT vendors.
There are probably an infinite number of ways to deal with the cloud. But outside of the common ones we discussed in part 1, only a dozen or so seem feasible to me and even less are fairly viable for present IT vendors.
Move to the edge and IoT.
Make data center as easy and cheap to use as the cloud
Focus on low-latency, high data throughput, and high performing work and applications
Move 100% into services
Move into robotics
The edge has legs
Probably the first one we should point out would be to start selling hardware and software to support the edge. Speaking in financial terms, the IoT/Edge market is estimated to be $754B in 2019, and growing by over a 15.4% CAGR ).
So we are talking about serious money. At the moment the edge is a very diverse environment from cameras, sensors and moveable devices. And everybody seems to be in the act, big industrial firms, small startups and everyone in between. Given this diversity it’s hard to see that IT vendors could make a decent return here. But given its great diversity, one could say it’s ripe for consolidation.
And the edge could use some reference architectures where there are devices at the extreme edge, concentrators at the edge, more higher concentrators at nodes and more at the core, etc. So there’s a look and feel to it that seems like Ro/Bo – central core hub and spoke architectures, only on steroids with leaf proliferation that can’t be stopped. And all that data coming in has to be classified, acted upon and understood.
There are plenty of other big industrial suppliers in this IoT/edge field but none seem to have the IT end of the market that Hitachi Vantara can claim to. Some sort of combination of a large IT vendor and a large industrial firm could potentially do the same
However, Hitachi Vantara seems to be focusing on the software side of the edge. This may be an artifact of Hitachi family of companies dynamics. But it seems to be leaving some potential sales on the table.
Hitachi Vantara has the advantage of being into industrial technology in a big way so the products they create operate in factories, rail yards, ship yards and other industrial sites around the world already. So, adding IoT and edge capabilities to their portfolio is a natural extension of this expertise.
There are a few vendors going into the Edge/IoT in a small way, but no one vendor personifies this approach more than Hitachi Vantara. The Hitachi family of companies has a long and varied history in OT (operational technology) or industrial technology. And over the last many years, HDS and now Hitachi Vantara, have been pivoting their organization to focus more on IoT and edge solutions and seem to have made IOT, OT and the edge, a central part of their overall strategy.
So there’s plenty of money to be made with IoT/Edge hardware and software, one just has to go after it in a big way and there’s lots of competition. But all the competition seems to be on the same playing field (unlike the public cloud playing field).
Getting to “data center as a cloud”
There are a number of reasons why customers migrate work to the cloud, ease of use, ease of storage, ease of scale, access to myriad applications, access to multi-regional data centers, CAPex financial model, to name just a few.
There’s nothing that says much of this couldn’t be provided at the data center. It’s mostly just a lot of open source software and a lot of common hardware. IT vendors can do this sort of work if they put their vast resources to go after it.
From the pure software side, there are a couple of companies trying to do this namely VMware and Nutanix but (IBM) RedHat, (Dell) Pivotal, HPE Simplivity and others are also going after this approach.
Hardware wise CI and HCI, seem to be rudimentary steps towards common hardware that’s easy to deploy, operate and support. But these baby steps aren’t enough. And delivery to deployment in weeks is never going to get them there. If Amazon can deliver books, mattresses, bicycles, etc in a couple of days. IT vendors should be able to do the same with some select set of common hardware and have it automatically deployable in seconds to minutes once powered on.
And operating these systems has to be drastically simplified. On any public cloud there’s really no tuning required, almost minimal configuration, and then it’s just load your data and go. Yes there’s a market place to select, (virtual) hardware, (virtual) storage hardware, (virtual) networking hardware, (virtual server) O/S and (virtual?) open source applications.
Yes there’s a lots of software behind all that virtualization. And it’s fundamentally different than today’s virtualized systems. It’s made to operate only on commodity hardware and only with open source software.
The CAPex financial model is less of a problem. Today. I find many vendors are offering their hardware (and some software) on a CAPex, pay as you go model. More of this needs to be made available but the IT vendors see this, and are already aggressively moving in this direction.
The clouds are not standing still what with Azure Stack, AWS and GCP all starting to provideversions of their stack on prem in the enterprise. This looks to be a strategic battleground between the clouds and IT vendors.
Making everything IT can do in the cloud available in the data center, with common hardware and software and with the speed and ease of deployment, operations and support (maintenance) should be on every IT vendors to do list.
Unfortunately, this is not going to stop the public cloud completely, but it has the potential to slow the growth rate. But time is short, momentum has moved to the public cloud and I don’t (yet) see the urgency of the IT vendors to make this transition happen today.
Focus on low-latency, high data throughput and high performance work
This is somewhat unfair as all the IT vendors are already involved in these markets in a big way. But, there are some trends here, that indicate this low-latency market will be even more important over time.
For example, more and more of commercial IT is starting to take advantage of big data and AI to profit from all their data. And big science is starting to migrate to IT, where massive data flows and data analysis tools are becoming important to the data center. If anything, the emergence of IoT and the edge will increase data flows that need to be analyzed, understood, and ultimately dealt with.
DNA genomics may be relegated to big pharma/medical but 3D visualization is becoming so mainstream that I can do it on my desktop. These sorts of things were relegated to HPC/big science just a decade or so ago. What tools exist in HPC today that the IT data center of the future will deam a necessary part of their application workload.
Is this a sizable TAM, probably not today. In all honesty it’s buried somewhere in the IT TAM above. But it can be a growing niche, where IT vendors can stake a defensive position and the cloud may have a tough time dislodging.
I say the cloud “may have trouble dislodging” because nothing says that the entire data flow/work flow couldn’t migrate to the cloud, if the responsiveness was available there. But, if anything (guaranteed) responsiveness is one of the few achilles heels of the public cloud. Security may be the other one.
We see IBM, Intel, and a few others taking this space seriously. But all IT vendors need to see where they can do better here.
Focus on services
This not really out-of-box thinking. Some (old) IT vendors have been moving into services for over 50 years now others are just seeing there’s money to be made here. Just about every IT vendor has deployment & support services. most hardware have break-fix services.
But standalone IT services are more specialized and in the coming cloud apocalypse, services will revolve around implementing cloud applications and functionality or migrating work from the cloud or (rarely in the future) back to on prem.
So services are already a significant portion of IT spend today. And will probably not be impacted by the move to the cloud. I’d say that because implementing applications and services will still exist as long as the cloud exists. Yes it may get simpler (better frameworks, containerization, systemization), but it won’t ever go away completely.
Robots, the endgame
Ok laugh now. I understand this is a big ask to think that Robot spending could supplement and maybe someday surpass IT spending. But we all have to think long term. What is a self driving car but a robotic data center on wheels, generating TB of data every day it’s driven.
Robots over the next century will invade every space, become ever present and ever necessary to modern world functioning . They will have sophisticated onboard computing, motors, servos, sensors and on board and backend processing requirements. The real low-latency workload of the future will be in the (computing) minds of robots.
Even if the data center moves entirely to the cloud, all robotic computation will never reside there because A) it’s too real time and B) it needs to operate well even disconnected from the Internet.
Is all this going to happen in the next 10 or 20 years, maybe not but 30 to 50 years out this world will have a multitude of robots operating within it. .
Who’s going to develop, manufacture, support and sustain these mobile computing data centers on wheels, legs, slithering and flying bodies?
I would say IT vendors of today are uniquely positioned to dominate this market. Here to the industry is very fragmented today. There are a few industrial robotic companies and just about every major auto manufacturer is going after self driving cars. And there are many bit players today. So it’s ripe for disruption and consolidation. .
Yet, none of the major IT vendors seem to be going after this. Ok Amazon (hardware & software) and Microsoft (software) have done work in this arena. If anything this should tell IT vendors that they need to start working here as well.
But alas, none have taken up the mantle. In the mean time robot startups are biting the dust left and right, trying to gain market traction.
That seems to be about it for the major viable out of the box approaches to the public cloud threat. I have a few other ideas but none seem as useful as the above.
One startup that caught my eye was SpaceBelt from Cloud Constellation Corporation, that’s planning to put PB (4X library of congress) of data storage in a constellation of LEO satellites.
The LEO storage pool will be populated by multiple nodes (satellites) with a set of geo-synchronous access points to the LEO storage pool. Customers use ground based secure terminals to talk with geosynchronous access satellites which communicate to the LEO storage nodes to access data.
Their main selling points appear to be data security and availability. The only way to access the data is through secured satellite downlinks/uplinks and then you only get to the geo-synchronous satellites. From there, those satellites access the LEO storage cloud directly. Customers can’t access the storage cloud without going through the geo-synchronous layer first and the secured terminals.
The problem with terrestrial data is that it is prone to security threats as well as natural disasters which take out a data center or a region. But with all your data residing in a space cloud, such concerns shouldn’t be a problem. (However, gaining access to your ground stations is a whole different story.
AWS and Lockheed-Martin supply new ground station service
The other company of interest is not a startup but a link up between Amazon and Lockheed Martin (see: Amazon-Lockheed Martin …) that supplies a new cloud based, satellite ground station as a service offering. The new service will use Lockheed Martin ground stations.
Currently, the service is limited to S-Band and attennas located in Denver, but plans are to expand to X-Band and locations throughout the world. The plan is to have ground stations located close to AWS data centers, so data center customers can have high speed, access to satellite data.
There are other startups in the ground station as a service space, but none with the resources of Amazon-Lockheed. All of this competition is just getting off the ground, but a few have been leasing idle ground station resources to customers. The AWS service already has a few big customers, like DigitalGlobe.
One thing we have learned, is that the appeal of cloud services is as much about the ecosystem that surrounds it, as the service offering itself. So having satellite ground stations as a service is good, but having these services, tied directly into other public cloud computing infrastructure, is much much better. Google, Microsoft, IBM are you listening?
Data centers in space
Why stop at storage? Wouldn’t it be better to support both storage and computation in space. That way access latencies wouldn’t be a concern. When terrestrial disasters occur, it’s not just data at risk. Ditto, for security threats.
Having whole data centers, would represent a whole new stratum of cloud computing. Also, now IT could implement space native applications.
If Microsoft can run a data center under the oceans, I see no reason they couldn’t do so in orbit. Especially when human flight returns to NASA/SpaceX. Just imagine admins and service techs as astronauts.
And yet, security and availability aren’t the only threats one has to deal with. What happens to the space cloud when war breaks out and satellite killers are set loose.
Yes, space infrastructure is not subject to terrestrial disasters or internet based security risks, but there are other problems besides those and war that exist such as solar storms and space debris clouds. .
In the end, it’s important to have multiple, non-overlapping risk profiles for your IT infrastructure. That is each IT deployment, may be subject to one set of risks but those sets are disjoint with another IT deployment option. IT in space, that is subject to solar storms, space debris, and satellite killers is a nice complement to terrestrial cloud data centers, subject to natural disasters, internet security risks, and other earth-based, man made disasters.
On the other hand, a large, solar storm like the 1859 one, could knock every data system on the world or in orbit, out. As for under the sea, it probably depends on how deep it was submerged!!
I’ve written on Scratch before (see my 10 years of Scratch and still counting post). It’s essentially an object oriented, visual programming language for kids. Nontheless, it is pretty sophisticated. The team at MIT just released Scratch 3.0, with a number of new extensions and updates to make it easier to work with.
Google also has a visual object oriented programming tool, called Blockly. I’ve used a variant of Blockly to program an Android phone based robot controller. It’s ok, but Blocky lacks a good collaboration mode and editing large Blockly code modules is not as easy as it should be.
On the other hand Scratch is made for collaboration. They have a web page with 1000s of collaborations listed. Seems like there’s a bit for everyone on the collaboration list. And they have a a number of starter Scratch projects that anyone can tackle to earn coding cards that will gentling introduce you to scratch and coding.
When I first ran across Scratch I used it to create sounds based on key combinations. Then I moved to animating sprites (drawn characters, which you can draw yourself or use one of many they have). Then I moved to animating planes, then groups of planes, then created a game where one plane would be followed by others. And then added a way where one plane could shoot another and so on.
It didn’t take me very long to get to a point where I had fleets of planes moving around the screen fighting each other. I haven’t done anything big with Scratch before but I’ve done a number of mini games/animations with my kids and it was fun to toy with.
Used to be you had to download and run Scratch locally on your PC/Mac. With later versions, they have Scratch Desktops that one can download for Windows and MacOS.
Alternatively, one could also use the web based version. In this way you can easily run it in any web browser.
The new desktop is more like a visual IDE than the old one I’m used to and looks exactly like the one on the web. The first Scratch I used presented itself in a table top screen with various Scratch tools surrounding this table top. I’m sure it makes things easier for beginning coders not to be presented with a Scratch world of tools right off the bat and just to have a sprite to play with. I suspect that all these tools are now buried in Scratch Tutorials
Scratch 3.0 comes with a number of extensions
One of the extensions allows you to program LEGO Robotics, another provides a way to interact with a blue tooth micro:bit controller, and another allows you to interact with your web cam to animate objects based on vision detection. There are plenty more and I’m sure this isn’t the end of them. (NB. Scratch team you need one for FIRST robotics) .
I just added a few for sounds and the text to speech extension. And it’s really easy to have Scratch 3.0 read out a text string for you. I suppose there would be a way for one to input a text file and have Scratch read it for you. But didn’t get that far with it.
I am a strong supporter of everyone learning how to code and solutions like MIT’s Scratch (and Google’s Blockly) are a great way to understand coding without having to deal with the pain/semantics of compilers, APIs or function libraries etc.
Just start coding and having fun. it’s amazing what one can accomplish. That’s what Scratch was made to do, enjoy.
My money is on PyTorch and Tensorflow as being the two frameworks most likely to succeed. However all the above use many open source facilities and there seems to be a lot of cross breeding across them. Both AWS ML solutions and Microsoft CNTK offer PyTorch and TensorFlow frameworks/APIs as one option among many others.
AWS Machine Learning
I spent about an hour plus looking over the AWS SageMaker tutorial videos in the developer section of AWS machine learning curriculum. Signing up was fairly easy but I already had an AWS login. You also had to enroll/register for the course on your AWS login but once that was through, you could access courses.
In the comments on the AWS blog post there were a number of entries indicating broken links and other problems but I didn’t have any issues. Then again, I didn’t start at the beginning, only looked at over one series of courses, and was using the websites one week after they were announced at Re:Invent.
Amazon SageMaker is an overarching framework that can be used to perform machine learning on AWS, all the way from gathering, analyzing and modifying the dataset(s), to training the model, to creating a inference engine available as an endpoint that can be used to perform the inferencing.
Amazon also has special purpose API based tools that allow customers to embed intelligence (inferencing) directly into their application, without needing to perform the ML training. These include:
Amazon Pollywhich provides text to speech services in multilple languages, and
Amazon Lexwhich provides speech recognition technology (used by Alexa) and together with Polly helps embed conversational interfaces into customer applications.
TensorFlow Machine Learning
In the past I looked over the TensorFlow tutorials and recently rechecked them out. I found them much easier to follow this time.
The Google IO 2018 video on TensorFlow, Getting Started With TensorFlow High Level APIs, takes you through a brief introduction to the Colab(oratory), a GCP solution that uses TensorFlow and how to use Tensorflow Keras, tf.data and TensorFlow Eager Execution to create machine learning models and perform machine learning.
Keras on TensorFlow seems to be the easiest approach to use machine learning technologies. The video spends most of the time discussing a Colab Keras code element, ~9 lines, that loads a image classification dataset, defines a 1 level (one standard layer and one output layer), trains it, validates it and uses it to perform inferencing.
The video also touches a bit on tf.data and TensorFlow Eager Executionbut the main portion discusses the 9 line TensorFlow Keras machine learning example.
Both Colab and AWS Sagemaker use and discuss Jupyter Notebooks. These appear to be an open source approach to documenting and creating a workflow and executing Python code automatically.
GCP Colab is essentially a GCP-Google Drive based Jupyter notebook execution engine. With Colab you create a Jupyter notebook on google drive and interactively execute it under Colab. You can download your Juyiter notebook files and essentially execute them anywhere else that supports TensorFlow (that supports TensorFlow v1.7 or above, with Keras API support).
In the video, the Google IO instructors (Josh Gordon and Lawrence Moroney) walk you through building a model to recognize handwritten digits and outputs a classification (0..9) of what the handwritten digit represents.
It uses a standard labeled handwriting to digits labeled data set, called the MNIST database of handwritten digits that’s already been broken up into a training set and a validation set. Josh calls this the “Hello World” of machine learning.
The instructor in the video walks you through the (Jupyter Notebook – Eager Execution-Keras) code that inputs the data set (line 2), builds a 1 level (really two layer, one neural net layer and one output layer) neural network model (lines 3-6), trains the model (line 7), tests/validates the model (line 8) and then uses it to perform an inference (line 9).
Josh spends a little time discussing neural networks and model optimizations and some of the other parameters used in the code above. He has a few visualizations of what this all means but for the most part, the code uses a simple way to build a neural net model and some standard optimization techniques for the network.
He then goes on to discuss tf.data which is an API that can be used to create machine learning datasets and provide this data to the neural net for training or inferencing. Apparently tf.data has a number of nifty features that allow you to take raw data and transform it into something that can be used to feed neural nets. For example, separating the data into batches, shuffling (randomizing) the batches of data, pre-fetching it so as to not starve the GPU matrix multipliers, etc.
Then it goes into how machine learning is different than regular coding. And show how TensorFlow Eager Execution is really just like Python execution. They go through another example (larger) of machine learning, this one distinguishes between cats and dogs. While they use an open source Python IDE , PyCharm, to test and walk through their TF Eager Execution code, setting breakpoints and examining data along the way.
Never got around to Microsoft’s Azure training other than previewing some websites but plan to look over that soon.
I would have to say that the Google IO session on using TensorFlow high level APIs was a lot more enjoyable (~40 minutes) than the AWS multiple tutorial videos (>>40 minutes) that I watched to learn about SageMaker.
Not a fair comparison as one was a Google IO intro session on TensorFlow high level APIs and the other was a series of actual training videos on Amazon SageMaker and the AWS services you can use to take advantage of it.
But the GCP session left me thinking I can handle learning more and using machine learning (via TensorFlow, Keras, Eager Execution, & tf.data) to actually do something while the SageMaker sessions left me thinking, how much AWS facilities and AWS infrastructure services, I would need to understand and use to ever get to actually developing a machine learning model.
I suppose one was more of an (AWS SageMaker) infrastructure tutorial and the other was more of an intro into machine learning using TensorFlow wherever you wanted to execute it.
I think I’m almost ready to start creating and feeding a TensorFlow model with my handwriting and seeing if it can properly interpret it into searchable text. If it can do that, I would be a happy camper
Screenshos from AWS Sagemaker series of tutorial video 1, 2, 3, 4 & 5, you may need a signin to view them
Read an article in Stanford Research, Crowdsourced research gives experience to global participants that discussed an activity in Stanford and other top tier research institutions to try to get global participation in academic research. The process is discussed more fully in a scientific paper (PDF here) by researchers from Stanford, MIT Media Lab, Cornell Tech and UC Santa Cruz.
They chose three projects:
A HCI (human computer interaction) project to design, engineer and build a new paid crowd sourcing marketplace (like Amazon’s Mechanical Turk).
A visual image recognition project to improve on current visual classification techniques/algorithms.
A data science project to design and build the world’s largest wisdom of the crowds experiment.
Why crowdsource academic research?
The intent of crowdsourced research is to provide top tier academic research experience to persons which have no access to top research organizations.
Participating universities obtain more technically diverse researchers, larger research teams, larger research projects, and a geographically dispersed research community.
Collaborators win valuable academic research experience, research community contacts, and potential authorship of research papers as well as potential recommendation letters (for future work or academic placement),
How does crowdresearch work?
It’s almost an open source and agile development applied to academic research. The work week starts with the principal investigator (PI) and research assistants (RAs) going over last week’s milestone deliveries to see which to pursue further next week. The crowdresearch uses a REDDIT like posting and up/down voting to determine which milestone deliverables are most important. The PI and RAs review this prioritized list to select a few to continue to investigate over the next week.
The PI holds an hour long video conference (using Google Hangouts On Air Youtube live stream service). On the conference call all collaborators can view the stream but only a select few are on camera. The PI and the researchers responsible for the important milestone research of the past week discuss their findings and the rest of the collaborators on the team can participate over Slack. The video conference is archived and available to be watched offline.
At the end of the meeting, the PI identifies next weeks milestones and potentially directly responsible investigators (DRIs) to work on them.
The DRIs and other collaborators choose how to apportion the work for the next week and work commences. Collaboration can be fostered and monitored via Slack and if necessary, more Google live stream meetings.
If collaborators need help understanding some technology, technique, or too, the PI, RAs or DRIs can provide a mini video course on the topic or can point to other information used to get the researchers up to speed. Collaborators can ask questions and receive answers through Slack.
When it’s time to write the paper, they used Google Docs with change tracking to manage the writing process.
The team also maintained a Wiki on the overall project to help new and current members get up to speed on what’s going on. The Wiki would also list the week’s milestones, video archives, project history/information, milestone deliverables, etc.
At the end of the week, researchers and DRIs would supply a mini post to describe their work and link to their milestone deliverables so that everyone could review their results.
Who gets credit for crowdresearch?
Each week, everyone on the project is allocated 100 credits and apportions these credits to other participants the weeks activities. The credits are used to drive a page-rank credit assignment algorithm to determine an aggregate credit score for each researcher on the project.
Check out the paper linked above for more information on the credit algorithm. They tried to defeat (credit) link rings and other obvious approaches to stealing credit.
At the end of the project, the PI, DRIs and RAs determine a credit clip level for paper authorship. Paper authors are listed in credit order and the remaining, non-author collaborators are listed in an acknowledgements section of the paper.
The PIs can also use the credit level to determine how much of a recommendation letter to provide for researchers
Tools for crowdresearch
The tools needed to collaborate on crowdresearch are cheap and readily available to anyone.
Google Docs, Hangouts, Gmail are all freely available, although you may need to purchase more Drive space to host the work on the project.
Wiki software is freely available as well from multiple sources including Wikipedia (MediaWiki).
Slack is readily available for a low cost, but other open source alternatives exist, if that’s a problem.
Github code repository is also readily available for a reasonable cost but there may be alternatives that use Google Drive storage for the repo.
Web hosting is needed to host the online Wiki, media and other assets.
Initial projects were chosen in computer science, so outside of the above tools, they could depend on open source. Other projects will need to consider how much experimental apparatus, how to fund these apparatus purchases, and how a global researchers can best make use of these.
My crowdresearch projects
Some potential commercial crowdresearch projects where we could use aggregate credit score and perhaps other measures of participation to apportion revenue, if any.
NVMe storage system using a light weight storage server supporting NVMe over fabric access to hybrid NVMe SSD – capacity disk storage.
Proof of Stake (PoS) Ethereum pooling software using Linux servers to create a pool for PoS ETH mining.
Bipedal, dual armed, dual handed, five-fingered assisted care robot to supply assistance and care to elders and disabled people throughout the world.
Non-commercial projects, where we would use aggregate credit score to apportion attribution and any potential remuneration.
A fully (100%?) mechanical rover able to survive, rove around, perform scientific analysis, receive/transmit data and possibly, effect repairs from within extreme environments such as the surface of Venus, Jupiter and Chernoble/Fukishima Daiichi reactor cores.
Zero propellent interplanetary tug able to rapidly transport rovers, satellites, probes, etc. to any place within the solar system and deploy theme properly.
A Venusian manned base habitat including the design, build process and ongoing support for the initial habitat and any expansion over time, such that the habitat can last 25 years.
Any collaborators across the world, interested in collaborating on any of these projects, do let me know, here via comments. Please supply some way to contact you and any skills you’re interested in developing or already have that can help the project(s).
I would be glad to take on PI role for the most popular project(s), if I get sufficient response (no idea what this would be). And I’d be happy to purchase the Drive, GitHub, Slack and web hosting accounts needed to startup and continue to fruition the most popular project(s). And if there’s any, more domain experienced PIs interested in taking any of these projects do let me know.
It was the worst of times. The industry changes had been gathering for a decade almost and by this time were starting to hurt.
The cloud was taking over all new business and some of the old. Flash’s performance was making high performance easy and reducing storage requirements commensurately. Software defined was displacing low and midrange storage, which was fine for margins but injurious to revenues.
Both companies had user events in Vegas the last month, NetApp Insight 2017 last week and Hitachi NEXT2017 conference two weeks ago.
As both companies respond to industry trends, they provide an interesting comparison to watch companies in transition.
NetApp’s underlying theme is to change the world with data and they want to change to help companies do this.
Vantara’s philosophy is data and processing is ultimately moving into the Internet of things (IoT) and they want to be wherever the data takes them.
Hitachi Vantara is a brand new company that combines Hitachi Data Systems, Hitachi Insight Group and Pentaho (an analytics acquisition) into one organization to go after the IoT market. Pentaho will continue as a separate brand/subsidiary, but HDS and Insight Group cease to exist as separate companies/subsidiaries and are now inside Vantara.
NetApp sees transitions occurring in the way IT conducts business but ultimately, a continuing and ongoing role for IT. NetApp’s ultimate role is as a data service provider to IT.
Vantara believes the main customer issue is the need to digitize the business. Because competition is emerging everywhere, the only way for a company to succeed against this interminable onslaught is to digitize everything. That is digitize your manufacturing/service production, sales, marketing, maintenance, any and all customer touch points, across your whole value chain and do it as rapidly as possible. If you don’t your competition will.
NetApp sees customers today have three potential concerns: 1) how to modernize current infrastructure; 2) how to take advantage of (hybrid) cloud; and 3) how to build out the next generation data center. Modernization is needed to free capital and expense from traditional IT for use in Hybrid cloud and next generation data centers. Most organizations have all three going on concurrently.
Vantara sees the threat of startups, regional operators and more advanced digitized competitors as existential for today’s companies. The only way to keep your business alive under these onslaughts is to optimize your value delivery. And to do that, you have to digitize every step in that path.
NetApp views the threat to IT as originating from LoB/shadow IT originating applications born and grown in the cloud or other groups creating next gen applications using capabilities outside of IT.
NetApp is looking mostly towards the cloud. At their conference they announced a new Azure NFS service powered by NetApp. They already had Cloud ONTAP and NPS, both current cloud offerings, a software defined storage in the cloud and a co-lo hardware offering directly attached to public cloud (Azure & AWS), respectively.
Vantara is looking towards IoT. At their conference they announced Lumada 2.0, an Industrial IoT (IIoT) product framework using plenty of Hitachi software functionality and intended to bring data and analytics under one software umbrella.
NetApp is following a path laid down years past when they devised the data fabric. Now, they are integrating and implementing data fabric across their whole product line. With the ultimate goal that wherever your data goes, the data fabric will be there to help you with it.
Vantara is broadening their focus, from IT products and solutions to IoT. It’s not so much an abandoning present day IT, as looking forward to the day where present day IT is just one cog in an ever expanding, completely integrated digital entity which the new organization becomes.
They both had other announcements, NetApp announced ONTAP 9.3, Active IQ (AI applied to predictive service) and FlexPod SF ([H]CI with SolidFire storage) and Vantara announced a new IoT turnkey appliance running Lumada and a smart data center (IoT) solution.
They both are.
Digitization is the future, the sooner organizations realize and embrace this, the better for their long term health. Digitization will happen with or without organizations and when it does, it will result in a significant re-ordering of today’s competitive landscape. IoT is one component of organizational digitization, specifically outside of IT data centers, but using IT resources.
In the mean time, IT must become more effective and efficient. This means it has to modernize to free up resources to support (hybrid) cloud applications and supply the infrastructure needed for next gen applications.
One could argue that Vantara is positioning themselves for the long term and NetApp is positioning themselves for the short term. But that denies the possibility that IT will have a role in digitization. In the end both are correct and both can succeed if they deliver on their promise.
Last year about this time Google released their 1st generation TPU chip to the world (see my TPU and HW vs. SW … post for more info).
This year they are releasing a new version of their hardware called the Cloud TPU chip and making it available in a cluster on their Google Cloud. Cloud TPU is in Alpha testing now. As I understand it, access to the Cloud TPU will eventually be free to researchers who promise to freely publish their research and at a price for everyone else.
What’s different between TPU v1 and Cloud TPU v2
The differences between version 1 and 2 mostly seem to be tied to training Machine Learning Models.
TPU v1 didn’t have any real ability to train machine learning (ML) models. It was a relatively dumb (8 bit ALU) chip but if you had say a ML model already created to do something like understand speech, you could load that model into the TPU v1 board and have it be executed very fast. The TPU v1 chip board was also placed on a separate PCIe board (I think), connected to normal x86 CPUs as sort of a CPU accelerator. The advantage of TPU v1 over GPUs or normal X86 CPUs was mostly in power consumption and speed of ML model execution.
Cloud TPU v2 looks to be a standalone multi-processor device, that’s connected to others via what looks like Ethernet connections. One thing that Google seems to be highlighting is the Cloud TPU’s floating point performance. A Cloud TPU device (board) is capable of 180 TeraFlops (trillion or 10^12 floating point operations per second). A 64 Cloud TPU device pod can theoretically execute 11.5 PetaFlops (10^15 FLops).
TPU v1 had no floating point capabilities whatsoever. So Cloud TPU is intended to speed up the training part of ML models which requires extensive floating point calculations. Presumably, they have also improved the ML model execution processing in Cloud TPU vs. TPU V1 as well. More information on their Cloud TPU chips is available here.
So how do you code a TPU?
Both TPU v1 and Cloud TPU are programmed by Google’s open source TensorFlow. TensorFlow is a set of software libraries to facilitate numerical computation via data flow graph programming.
Apparently with data flow programming you have many nodes and many more connections between them. When a connection is fired between nodes it transfers a multi-dimensional matrix (tensor) to the node. I guess the node takes this multidimensional array does some (floating point) calculations on this data and then determines which of its outgoing connections to fire and how to alter the tensor to send to across those connections.
Apparently, TensorFlow works with X86 servers, GPU chips, TPU v1 or Cloud TPU. Google TensorFlow 1.2.0 is now available. Google says that TensorFlow is in use in over 6000 open source projects. TensorFlow uses Python and 1.2.0 runs on Linux, Mac, & Windows. More information on TensorFlow can be found here.
So where can I get some Cloud TPUs
Google is releasing their new Cloud TPU in the TensorFlow Research Cloud (TFRC). The TFRC has 1000 Cloud TPU devices connected together which can be used by any organization to train machine learning algorithms and execute machine learning algorithms.
I signed up (here) to be an alpha tester. During the signup process the site asked me: what hardware (GPUs, CPUs) and platforms I was currently using to training my ML models; how long does my ML model take to train; how large a training (data) set do I use (ranging from 10GB to >1PB) as well as other ML model oriented questions. I guess there trying to understand what the market requirements are outside of Google’s own use.
Google’s been using more ML and other AI technologies in many of their products and this will no doubt accelerate with the introduction of the Cloud TPU. Making it available to others is an interesting play but this would be one way to amortize the cost of creating the chip. Another way would be to sell the Cloud TPU directly to businesses, government agencies, non government agencies, etc.
I have no real idea what I am going to do with alpha access to the TFRC but I was thinking maybe I could feed it all my blog posts and train a ML model to start writing blog post for me. If anyone has any other ideas, please let me know.