data services – Silverton Consulting

Dell EMC PowerStore X and the Edge – TFDxDell

Posted on November 11, 2021November 11, 2021 by Ray in data services, IoT, Strategic planning, System effectiveness, VMware storage, vVOL storage

This past summer I attended a virtual TFDxDell event where there was a number of sessions discussing Dell EMC technologies for the enterprise. One session sort of struck a nerve, the Dell EMC PowerStore session and I have finally figured out what interested me most in their talk, their PowerStore X appliances and AppsON technologies

What is AppsON and PowerStore X appliance?

Essentially PowerStore X with AppsON has an onboard ESXi hypervisor which allows customers to run vSphere VMs inside the storage system with direct vVol (I assume) access to PowerStore data storage without having to go out over a (storage) network.

PowerStore X ESXi is a little behind the most recent VMware vSphere releases (at least 30 days) but it’s current enough for most shops. In non-PowerStore X appliances, PowerStoreOS runs as containers but in PowerStore X, PowerStoreOS storage functionality runs as VMs, just like any other VMs running on its ESXi hypervisor.

Moreover, PowerStore X can still service IOs from other non-PowerStore X resident VMs or bare metal applications running in the environment. In this way you get all the data services of an enterprise class storage system, that also run VMs.

With PowerStore OS 2.0 they have added scale out to AppsON. That is any PowerStore X (1000X, 3000X, 5000X or 7000X) appliance, in a PowerStore X cluster, can have their VMs move from one appliance to another using vSphere vMotion. This means that as your PowerStore X storage clusters grow, you can rebalance VM application workloads across the cluster. A PowerStore X cluster can contain up to 4 PowerStore X appliances.

PowerStore’s heritage goes back quite a ways at Dell and EMC. Prior versions of EMC Unity storage and some of its progenitors had the ability to run applications on the storage itself. But by running an ESXi hypervisor on PowerStore X appliances, it takes all this to a whole new level.

Why would anyone want AppsON?

It’s taken me sometime to understand why anyone would want to use AppsON and I have concluded that the edge might be the best environment to deploy it.

Recent VMware enhancements have reduced minimum node configurations for edge environments to 2 servers. It’s unclear to me whether a single PowerStore X appliance with AppsON is one server or two but, for the moment lets assume its just one. This means that a minimum VMware vSphere edge deployment could use 1 PowerStore X and 1 standalone, ESXi server.

In such an environment, customers could run their data intensive VMs directly on the PowerStore X and some of their non-data intensive VMs on the standalone server. But the flexibility exists to vMotion VMs from one to the other as demand dictates.

But does the edge need storage?

Yes, some do. For instance, take 5G. it enables a whole new class of mobile services and many of them can be quite data intensive. 5G is being deployed around the world as mini-data centers in cell towers. Unclear whether these data centers run vSphere but I’m sure VMware is trying their hardest to make that happen. With vSphere running your 5G mini-datacenter, PowerStore X could make a smart addition.

Then there’s all the smart cars, which are creating TBs of sensor data every time they take to the road. You’re probably not going to have a PowerStore appliance in your smart car (at least anytime soon) but they just might have one at the local service station.

And maybe given all the smart devices in your home, smart cars, smart appliances, smart robots, etc., there’s going to be a whole lot of data generated from your smart home. Having something like PowerStore X in your smart home’s mini-data center would offer a place to hold all that data and to do some processing (compressing maybe) before sending it up to the cloud.

~~~~

We have just two more questions for Dell EMC,

Shouldn’t the base PowerStore appliance be called PowerStore K?
Shouldn’t customers be allowed to run their own K8s container apps on their PowerStore K just as easily as running VMs in their PowerStore X?

Legal Disclosure: TechFieldDay and Dell provided gifts to all participants (including me) for the TFDxDell event.

Photo credit(s):

From Dell EMC slides presented at TFDxDell event
From Dell EMC slides presented at TFDxDell event
From Dell EMC slides presented at TFDxDell event

NASA’s journey to the cloud – part 1

Posted on October 22, 2021October 22, 2021 by Ray in Cloud storage, Data, Data growth, data services, storage economics, Strategic Inflection Points

Read an article the other day, NASA Turns to the Cloud for Help With Next-Generation Earth Missions about how NASA was had started to migrate all their data to the cloud and intended to store all new data there as well. The hope is that researchers would no longer need to download NASA data but rather could access it directly using cloud compute resources.

It turns out that newer earth science satellites are generating so much data that hosting all this data is becoming a challenge and with the quantities being discussed, researchers downloading the data, to perform research in their own environments may take days.

Until recently, earth science data has been hosted and downloadable from NASA, ESA and other space organization sites. For example, see NASA’s GHCR DAAC (Global Hydrometerological Resource Center Distributed Active Archive Center), ESA EarthOnline, JAXA GPM website, etc. Generally one could download a time series of data from any of their prior and current earth/planetary science missions without too much trouble.

The Land Processes Distributed Active Archive Center (LP DAAC) archives and distributes Global Forest Cover Change (GFCC) data products through the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) (https://earthdata.nasa.gov/community/community-data-system-programs/measures-projects) Program….

But NASA’s newest earth science satellites will be generating lot’s of data. For instance, the SWOT (Surface Water and Ocean Topography) mission data load will be 20TB/day and the NISAR (NASA-Indian Synthetic Aperture Radar) mission data load will be 80TB/day. And it’s only getting worse as more missions with newer instruments come online.

NASA estimates that, over time, they will store 247PB of data in their EarthData Cloud. At the moment, they have already migrated some (all of ASF [Alaska Satellite Facility] DAAC and some of PO.DAAC [Physical Ocean]) of their Earth Science data to AWS (us-west-2) and over time all of it will migrate there.

NASA will eat any egress charges for EOSDIS data and are also paying any and all hosting fees to storage the data in AWS. Unclear whether they are using standard S3 or S3-Intelligent Tiering. And presumably they are using S3 replication to ensure they don’t lose DAAC data in the cloud, but I don’t see any evidence of that in the literature I’ve read. Of course this doubles the storage costs for their 247PB of DAAC data.

Access to all this data is available to anyone with an EarthData login. There you can register for a profile to access NASA earth sciences data.

NASA’s EarthData also offers a number of AWS cloud based services to help one access this data:

EarthData search – filtered search facility to access NASA EarthData by platform (e.g. satellite), instrument (e.g. camera/visual data), organization (e.g. NASA/JPL), etc.
EarthData Common Metadata Repository – API driven metadata repository that ” catalogs all data and service metadata records for NASA’s EOSDIS (Earth Observing System Data and Information System) system” data, that can be accessed by anyone, which includes programatic access to EarthData search.
EarthData Harmony – which is a EarthData Jupyter notebook examples and API documentation to perform research on earth science data in the EarthData cloud.

One reason to movie EOSDIS DAAC data to the cloud is to allow researchers to not have to download data to run their analysis. By using in cloud EC2 compute instances, they can run their research in AWS with direct , high speed access to the EarthData.

Of course, the researcher would need to purchase their EC2 compute facility directly from AWS. w. NASA publishes a sort of AWS pricing primer for researchers to use AWS EC2 compute to do research directly on the data in the cloud. Also NASA offers a series of tutorials on how to use the AWS cloud for doing research on NASA DAAC data.

Where to from here?

I find this all somewhat discouraging. Yes it’s the Gov’t but one needs to wonder what the overall costs of hosting NASA DAAC data on the AWS cloud will be over the long haul. Most organizations use the cloud to prototype and scale up services but once these services have stabilized, theymigrate them back to onprem/CoLoinfrastructure. See for example, Dropbox’s move away from the [AWS] cloud for ~600PB of data.

I get it, the public cloud allows for nearly infinite data scaleability. But cloud storage costs is not cheap, especially when you are talking about 100s of PBs. And in today’s world, with a whole bunch of open source solutions for object storage and services, one can almost recreate any cloud service in your own data center, at much lower price.

Sure it will still take IT infrastructure and personnel to put it all together. But NASA doesn’t seem to be lacking in infrastructure or IT personnel. Even if you are enamored with AWS services and software infrastructure, one can always run AWS Outpost in your data centers. And DAAC services seem to be pretty stable over time. Yes new satellites will generate more data, but the data load is understood and very predictable. So one should be able to anticipate all this and have infrastructure in place to deal with it.

Yes, having the ability to run analysis in the cloud directly on the data sitting also in the cloud is useful, especially not having to download TB of data. But these costs can also be significant and they are born by the researcher not NASA.

Another grip is why use AWS alone. The other cloud providers all have similar object storage and compute capabilities. It seems wiser to me to set up the EarthData service such that, different DAACs reside in different clouds. This would he more complex and harder to administer and use but I believe in the long run would lead to better more effective services at a more reasonable price.

Going to the cloud doesn’t have to be a one way endeavor. After using the cloud for a while, NASA should have a better idea of the costs of doing so and at that time understand better what it can and cannot afford to do on its own.

It will be interesting to see what ESA, JAXA, CERN and other big science organizations do as they are all in the same bind, data seems to be growing unbounded.

Picture Credit(s):

NASA EarthData Cloud website
NASA LP [Land Process] DAAC website
NASA EarthData Cloud AWS primer page
NASA EarthData Cloud Evolution page

Data in motion #DellTechWorld

Posted on October 22, 2020October 22, 2020 by Ray in data logistics, data mobility, data services

I (virtually) attended DTW this week and Michael Dell and others in their keynote segments mentioned that the new world involves both data at rest and data in motion. I was curious as to about this new concept of data in motion, so I spent some time looking into it.

AWS Lambda server less processing service and Apache Kafka probably best represents this idea of data in motion. Dell Boomi, IBM MQ, Google cloud Pub/Sub, etc. also provide similar services to Kafka.

With AWS Lambda, clients deposit data in object buckets and AWS automatically invokes some program, container, service, etc. to process that data and then the service goes away until the next data is deposited. Kafka is AWS Lambda on steroids.

Kafka is a completely open source (GitHub) system that’s run using a cluster of servers and provides a “message processing” system. A minimum Kafka cluster is 3 servers (containers, VMs or bare metal).

How does Kafka work

In Apache Kafka, you have producers, server/brokers and consumers. With Kafka, data comes in as events, with a key, values (essentially a bit stream, could be anything) and time-stamps which are created by producer clients and are automatically stored by Kafka servers or brokers and appended to topics (a sort of folder) in an ordered sequence. Topic events are then processed by consumer clients.

Topics are partitioned (sharded) using keys, and can be optionally replicated across a defined number of Kafka brokers within a cluster. Kafka clusters can span data centers , regions, clouds etc. Replication is done for fault tolerance. Topic partitioning provides scale out, distributed performance for Kafka.

Events can be simple messages for real time analysis or larger files for offline analysis. But they are all essentially produced, stored and consumed in an ordered, log like fashion.

Topic partitions can be multi-producer and multi-consumer. That is there can be 0, 1 or many producers of events in a topic (partition) and topic partitions can have 0, 1 or more consumers.

In Kafka, events are saved for a specified period and are not automatically jetisoned/deleted. As such, events can be read multiple times by consumers.

Kafka can also offer a guarantee that events are only processed once. Kafka can also guarantee that consumers of topic partitioned events always read events in arrival order.

Consumers register to see events they are interested in. As mentioned earlier, there can be multiple consumers of the same events. Consumers can take the form of micro services/containers, programs, systems, etc. When an event is stored, consumer clients registered for that event, get notified to process the event.

In Kafka producers and consumers are fully decoupled. They have no need to know about one another and indeed, can exist in different servers, clusters, data centers, etc. Event producers don’t wait on consumers. Event consumers are notified when an event is available and can do whatever processing is required for that event.

Kafka APIs

Kafka has APIs for:

Admin services API to provide monitoring and management of the Kafka cluster and services
Producers API to publish and create events
Consumers API to subscribe to read events
Kafka Streams API to supply higher level stream processing for events, such as micro services, with stateless processing, stateful processing, and within stateful processing, providing events within a (time) window. Events can be processed from one or more topics and used to transform (process) these into other events to be written to one or more topics. It supports per event processing with (2) millisecond latency for highly tuned systems. Streams use a Java API that can be deployed in containers, VMs, bare metal, in the cloud etc. Stream processing is not performed on the Kafka cluster but must be performed elsewhere. Kafka streams can be used to create advanced and complex data pipelines.
Kafka Connect API to supply the connections needed to get events from other outside, perhaps more traditional applications, environments, services into Kafka topics for processing and vice versa, output topic events to more traditional services. Connect services are available for many different applications, databases, systems, etc.. Connect can be used, for example, to provide a connection between an relational database and topics as well as connect topics to relational databases. You don’t code in Connect but rather provide declarative statements that define what data goes where.

Kafka is used in very many organizations (NY Times, LinkedIn, LINE, etc) to provide an almost, enterprise wide, all encompassing, processing bus where data comes in, is partitioned out to topics and then processed in real time or not. Kafka can be addictive. You start with a relatively small application and find uses for it throughout the company. Pretty soon, you are running your whole organization through Kafka.

Data in motion

So that’s an example of data in motion. Another way to think about Kafka and its data in motion is it’s represents the final step in the evolution of batch processing from mainframes of last century.

Batch processing of old, took a bunch of transactions, batched them together, and processed them one by one until the batch was done. With Kafka and similar systems, you essentially have a batch of one transaction and they provide all the framework and facilities needed to create, store and consume that single transaction (batch).

But in addition to this simplistic one transaction in, one process and one output. Kafka and other systems, provide a more general purpose system, with multiple transaction types (events) being created by multiple producers and being consumed by a multitude of processes, that can each produce one or more outputs which could be other events to start the process over again. This create event, process, create event, process, could go on ad infinitum.

And that’s what a data pipeline looks like. Event data comes in, it’s processed (filtered, aggregated, merged, etc.) and generates a different event which causes more processing, which creates other events, which causes other processing…..

And that’s data in motion.

Photo Credit(s):

All graphics and photos are from Apache Kafka website

Internet of Tires

Posted on December 10, 2019December 10, 2019 by Ray in 5G cellular networks, Cloud services, data services, Executive leadership, Information economy, Market dynamics, Strategic Inflection Points, System effectiveness, System quality, Visionary leadershp

Read an article a couple of weeks back (An internet of tires?… IEEE Spectrum) and can’t seem to get it out of my head. Pirelli, a European tire manufacturer was demonstrating a smart tire or as they call it, their new Cyber Tyre.

The Cyber Tyre includes accelerometer(s) in its rubber, that can be used to sense the pavement/road surface conditions. Cyber Tyre can communicate surface conditions to the car and using the car’s 5G, to other cars (of same make) to tell them of problems with surface adhesion (hydroplaning, ice, other traction issues).

Presumably the accelerometers in the Cyber Tyre measure acceleration changes of individual tires as they rotate. Any rapid acceleration change, could potentially be used to determine whether the car has lost traction due and why.

They tested the new tires out at a (1/3rd mile) test track on top of a Fiat factory, using Audi A8 automobiles and 5G. Unclear why this had to wait for 5G but it’s possible that using 5G, the Cyber Tyre and the car could possibly log and transmit such information back to the manufacturer of the car or tire.

Accelerometers have become dirt cheap over the last decade as smart phones have taken off. So, it was only a matter of time before they found use in new and interesting applications and the Cyber Tyre is just the latest.

Internet of Vehicles

Presumably the car, with Cyber Tyres on it, communicates road hazard information to other cars using 5G and vehicle to vehicle (V2V) communication protocols or perhaps to municipal or state authorities. This way highway signage could display hazardous conditions ahead.

Audi has a website devoted to Car to X communications which has embedded certain Audi vehicles (A4, A5 & Q7), with cellular communications, cameras and other sensors used to identify (recognize) signage, hazards, and other information and communicate this data to other Audi vehicles. This way owning an Audi, would plug you into this information flow.

Pirelli’s Cyber Car Concept

Prior to the Cyber Tyre, Pirelli introduced a Cyber Car concept that is supposedly rolling out this year. This version has tyres with real time pressure, temperature, (static) vertical load and a Tyre ID. Pirelli has been working with car manufacturers to roll out Cyber Car functionality.

The Tyre ID seems to be a file that can include anything that the tyre or automobile manufacturer wants. It sort of reminds me of a blockchain data blocks that could be used to validate tyre manufacturing provenance.

The vertical load sensor seems more important to car and tire manufacturers than consumers. But for electrical car owners, knowing car weight could help determine current battery load and thereby more precisely know how much charge is left in a battery.

Pirelli uses a proprietary algorithm to determine tread wear. This makes use of the other tyre sensors to predict wear and perhaps uses an AI DL algorithm to do this.

~~~

ABS has been around for decades now and tire pressure sensors for over 10 years or so. My latest car has enough sensors to pretty much drive itself on the highway but not quite park itself as of yet. So it was only a matter of time before something like smart tires would show up.

But given their integration with car electronics systems, it would seem that this would only make sense for new cars that included a full set of Cyber Tyres. That is until all tire AND car manufacturers agreed to come up with a standard protocol to communicate such information. When that happens, consumers could chose any tire manufacturer and obtain have similar if not the same functionality from them.

I suppose someone had to be first to identify just what could be done with the electronics available today. Pirelli just happens to be it for now in the tire industry.

I just don’t want to have to upgrade tires every 24 months. And, if I have to wait a long time for my car to boot up and establish communications with my tires, I may just take a (dumb) bike.

Photo Credit(s):

“PIRELLI TIRES TEST DRIVE” by luca michetti is licensed under CC BY-NC-ND 4.0
“Lingotto roof IV, Turin, 20181231” by G · RTM is licensed under CC BY-NC-ND 2.0
From Pirelli.com page on Cyber Car

Learning Machine Learning – part 2

Posted on December 18, 2018 by Ray in Cognitive computing, Cognitive science, Data science, data services, Machine Learning

In Learning Machine Learning – part 1 , we covered AWS and GCP tutorials on machine learning within each of their clouds. In part 2 we cover Microsoft tutorial(s) on machine learning in Azure.

I found Machine Learning Jump Start in Microsoft Visual Academy with instructors, Buck Woody and Seayoung Rhee. This is a series of 4 video tutorials on Azure ML Studio. ML studio seems similar to AWS SageMaker as it’s a framework to perform machine learning.

Azure (and probably AWS & GCP) have a number of methods to perform machine learning. ML studio happens to be the one that I found but there are many others worth examining.

Azure’s ML Studio tutorial videos were a better than AWS but not as good as GCP IMHO for learning machine learning. There are four videos in the series. I watched the first (~45 minutes), the second (~45 minutes) and most of the third (only 25 of ~45 minutes).

Video 1 Concepts and setting up a ML Studio account

In the first video, the instructors took a long time to get going and then when got someplace interesting, it was all play acting (human as a machine learner) to teach concepts.

The tutorials do distinguish between Supervised learning and Unsupervised learning. Both of which can apply to prediction or classification types of problems or outcomes. These are discussed as classic machine learning characteristics.

In the last 1/3 of the first video they discuss Azure ML Studio. It provides a common place to work and collaborate across team members. It also provides a graphical approach to machine learning. ML Studio also supports a programmable API, but I never got to that section in my viewing.

Some Azure ML Studio strengths:

It provides industry recognized data sets and data science algorithms that can be used as a black box, such as recommendation engines.
It allows you to publish and consume machine learning solutions.

On the Azure portal there’s a machine learning studio icon (it’s now buried under the 100+ services link, in the AI + Machine Learning section). You use this to create a new ML studio workspace.

Inside a workspace you can use Azure ML studio services. In the workspace you can review all your experiments (these are algorithms or predictive models being worked).

In the Experiments page you can create a new experiment which is sort of a graphical workflow of the machine learning task.

There you will find a list of Azure sample data sets and sample algorithms that can be used in your experiment. The first video didn’t go into much detail on any of this other than showing you how to get started and create a ML studio workspace.

Video 2 how to use ML studio

Video 2 takes your ML studio workspace and runs a rudimentary experiment with it. In this video they walk you through selecting a data set, selecting algorithms to use and how to connect them into a machine learning workflow.

Creating an ML Studio experiment is almost like flowcharting your workflow. You select the data you want and drop it into the workflow. Next select an extraction engine you want to use and drop it into the work flow and connect it to the data. Then. you identify what you want to do with the data (like training) and drop that algorithm into the workflow and so on. In the end you have defined a sequence of actions to perform on data.

In their example, dataset they use a user movie ratings dataset. They connect this to a bayesian learning model and to a IMDB database to extract movie titles. The tutorial experiment is a movie recommendation engine.Although it wasn’t a neural net many of the same techniques apply.

ML Studio uses an intuitive graphical approach to defining a machine learning workflow.

Video 3 publishing your ML Studio web service

Video 3 shows you how to publish (on Azure’s Marketplace) the recommendation engine created in video 2 as an OData web service.

I stopped watching the 3rd video after about 25 minutes as it was setting up various aspect of the OData web service to be deployed on Azure marketplace.

Using Azure ML studio seemed pretty straightforward. But it was much more data science/data analytics activity than neural network training.

The Azure MVA ML Studio tutorial was created in 2014 so some of the concepts are a bit dated but most still apply.

Looking today on the Azure Portal, I was still able to find the ML studio workspaces under one of the 10 AI + Machine Learning services. Again I would have to say the GCP tutorial was a better fit for what I wanted which was how do I create a neural net and get it trained.

Other ML approaches under Azure

There are other Azure approaches to machine learning and tutorials that support them. For example, there’s a quick start tutorial to understand how to use Python and Jupyter notebooks under Azure, which is probably closer to the neural net training in GCP.

I found myself skipping ahead a lot in video 1 as it was mainly about concepts and not much technical detail. Video 2 was a good intro into ML studio and Video 3 showed you how to publish a ML studio web service in Azure but it was more details than I wanted to know. I never got to video 4, which probably talked about ML Studio’s programable API.

If I had to do it over again, I probably would have viewed the quick start tutorial with Python and Jupyter notebooks, which sounded more like the GCP tutorials in the part 1 post.

On the other hand, Azure ML Studio tutorials supplied a good complement to the GCP tutorial, as a different (more graphical) way to do ML. It would probably be worthwhile to view before taking the AWS Sagemaker tutorials as it’s a bit higher level and quicker introduction into the workflow of AI and machine learning.

Comments?

Picture credit(s): Screen shots of Videos 1, 2 and 3 in the MVA series, (c) Microsoft

Blockchains go mainstream…

Posted on February 15, 2018February 15, 2018 by Ray in Cloud storage, data services, Distributed computing, Server virtualization, Strategic Inflection Points

I read an article a while back on Finland’s use of blockchain technology to provide bank accounts and identity services to immigrants (see MIT TechReview article about Finland).

Blockchains were originally invented as a way of supporting financial transactions outside the current, government monitored, financial marketplace. With Finland’s experiment, the government is starting to use blockchains to support the unbanked and monitoring their financial activity – go figure.

Debit cards on blockchain

Finland’s using a Helsinki based startup MONI, to assign a MONI card, essentially a prepaid MasterCard, to all immigrants. An immigrant can use their MONI card to pay for anything online or in real life, use it as a direct deposit account or to receive and track the use of government assistance.

Underlying the MONI card is public blockchain technology. That is MONI is not using normal credit card services to support it’s bank accounts, MONI money transfers are done through the use of public blockchains.

MONI accounts are essentially (crypto currency) wallets but used as a debit card. The user merely enters a series of numbers into web forms or uses their MONI card at a credit card terminals throughout Europe. Transferring money between MONI users anywhere in the World is also free and instantaneous.

Finland also sees an immutable record of all immigrant financial transactions, that can be monitored to track immigrant (financial) integration into the country.

MONI is intending to make this service more broadly available. A MONI card account costs €2/month and MONI take’s a small cut out of each monetary transaction.

IDs on blockchain

I read another article the other day “Microsoft to implement blockchain-based ID system” in CoinTelegraph about using blockchains as a universal digital ID.

India has over the last decade, implemented a digital government ID using biometrics (see Aadhaar wikipedia article). Other countries have been moving to e-government where use of government services is implemented over the Internet (see EU article on eGovernment in Lithuania). Such eGovernment services depend on a digitized population registry.

Although it’s unclear whether Aadhaar and Lithuania make use of blockchain technology for their ID services, Microsoft’s definitely looking to blockchains to provide unique accounts/digital IDs to it’s population of users.

User signon’s has been a prevalent problem of the web for years. Each and every web and mobile App requires a person to signon to personalize their App. Nowadays, many Apps support using Google ID or Facebook ID for a single signon and there are other technologies being offered that provide similar services. Using a blockchain ID could easily support a single signon service.

The blockchain ID (wallet) public key could easily be used to encrypt an authentication transaction, identifying the App and the user. This authentication transaction would be processed by the blockchain digital ID service would use the private key to decrypt the transaction and use a backend ID App repository for the user to check to see that the user loging in, is the person that opened the account, acting as a sort of “proof of who you are”

Storage on blockchain

Filecoin and StorJ are storage providers that use blockchain services to allow others to use your local (or networked) storage to provide storage to the world.

A while back I had written about (free) peer to peer storage and compute services (see my Free P2P cloud storage … post). But the problem was how do people benefit from hosting the P2P storage or compute. Filecoin and Storj solved this by paying in cryptocurrencies for storage hosted on your hardware.

Filecoin offers a storage auction and hosting service that anyone worldwide can log into and use. The data stored is encrypted end-to-end so that no one can see what’s being stored and the data is also erasure coded so that it is protected and accessible even with having one or more hosting sites be offline.

Filecoin uses “proofs of storage“, “proofs of space”, “proofs of data possession“, and “proofs of retrievability” as a way to guarantee their storage service works properly. They also use chained “proofs of replication” as “proofs of spacetime” as service validation checks. Proofs of Replication are a way of insuring that storage providers are not deduplicating data copies and charging for non-deduped storage. (See Filecoin’s Proof of Replication paper for more info).

Storj looks somewhat similar to Filecoin, but without as much sophistication behind it.

Compute on blockchain

Ethereum was invented to support smart contracts that run on blockchain technology. IBM’s HyperLegder OpenLedger project (see our GreyBeardsOnStorga Podcast and RayOnStorage post on Hyperledger) is another example.

Smart contracts are essentially applications that run in a blockchains virtualized environment. Blockchain services are used to run an application and validate that’s it’s run only once. In some cases smart contracts use external oracles to query as a way to verify something or some action has occurred outside the blockchain. Other oracles can be entirely digital entities that check on a particular commodity price, weather pattern, account value, etc. The oracle becomes a critical step in determining the go no go status of a smartcontract.

Advertisements vs. crypto mining

Salon, a news providing website, offers readers an option to see advertisements or to allow Salon to use their computer (browser) to mine crypto coins. (See Salon offers… article in CoinDesk).

I believe this offer is made when the website detects a viewer is using ad blockers.

~~~~

Tthe trend is clear, people, organizations and even governments are looking at blockchain technology to provide basic and advanced services around the world.

If anyone would is interested in providing a pre-paid Visa card via blockchains, please contact me. I’d like to help.

Now if I could just find my GPU’s at a decent price somewhere…

Speaking of advertising… RayOnStorage doesn’t use advertising. But blogging like this takes time and money. If anyone’s interested in helping fund this blog, please consider sending some BTC our way, even 0.0001 BTC would help.

Our BTC wallet address is:

1MqBbAvMo6QbCVD6ZwtbLaPxmcUZGj9Ghw

Photo Credit(s): Blockchain and the public sector on OpenGovAsia.com

Unleash your design teams with single signon on Unifilabs.com

Understanding the difference between P2P and Client-server networks on LinkedIN

Blockgeek’s guide to smart contracts

The fragility of public cloud IT

Posted on March 21, 2017March 22, 2017 by Ray in Artificial Intelligence, Cloud services, data services, Information economy, Internet of Things, Internet traffic, Strategic Inflection Points, System effectiveness

I have been reading AntiFragile again (by Nassim Taleb). And although he would probably disagree with my use of his concepts, it appears to me that IT is becoming more fragile, not less.

For example, recent outages at major public cloud providers display increased fragility for IT. Yet these problems, although almost national in scope, seldom deter individual organizations from their migration to the cloud.

Tragedy of the cloud commons

The issues are somewhat similar to the tragedy of the commons. When more and more entities use a common pool of resources, occasionally that common pool can become degraded. But because no-one really owns the common resources no one has any incentive to improve the situation.

Now the public cloud, although certainly a common pool of resources, is also most assuredly owned by corporations. So it’s not a true tragedy of the commons problem. Public cloud corporations have a real incentive to improve their services.

However, the fragility of IT in general, the web, and other electronic/data services all increases as they become more and more reliant on public cloud, common infrastructure. And I would propose this general IT fragility is really not owned by any one person, corporation or organization, let alone the public cloud providers.

Pre-cloud was less fragile, post-cloud more so

In the old days of last century, pre-cloud, if a human screwed up a CLI command the worst they could happen was to take out a corporation’s data services. Nowadays, post-cloud, if a similar human screws up a CLI command, the worst that can happen is that major portions of the internet services of a nation go down.

Yes, over time, public cloud services have become better at not causing outages, but they aren’t going away. And if anything, better public cloud services just encourages more corporations to use them for more data services, causing any subsequent cloud outage to be more impactful, not less

The Internet was originally designed by DARPA to be more resilient to failures, outages and nuclear attack. But by centralizing IT infrastructure onto public cloud common infrastructure, we are reversing the web’s inherent fault tolerance and causing IT to be more susceptible to failures.

What can be done?

There are certainly things that can be done to improve the situation and make IT less fragile in the short and long run:

Use the cloud for non-essential or temporary data services, that don’t hurt a corporation, organization or nation when outages occur.
Build in fault-tolerance, automatic switchover for public cloud data services to other regions/clouds.
Physically partition public cloud infrastructure into more regions and physically separate infrastructure segments within regions, such that any one admin has limited control over an amount of public cloud infrastructure.
Divide an organizations or nations data services across public cloud infrastructures, across as many regions and segments as possible.
Create a National Public IT Safety Board, not unlike the one for transportation, that does a formal post-mortem of every public cloud outage, proposes fixes, and enforces fix compliance.

The National Public IT Safety Board

The National Transportation Safety Board (NTSB) has worked well for air transportation. It relies on the cooperation of multiple equipment vendors, airlines, countries and other parties. It performs formal post mortems on any air transportation failure. It also enforces any fixes in processes, procedures, training and any other activities on equipment vendors, maintenance services, pilots, airlines and other entities that can impact public air transport safety. At the moment, air transport is probably the safest form of transportation available, and much of this is due to the NTSB

We need something similar for public (cloud) IT services. Yes most public cloud companies are doing this sort of work themselves in isolation, but we have a pressing need to accelerate this process across cloud vendors to improve public IT reliability even faster.

The public cloud is here to stay and if anything will become more encompassing, running more and more of the worlds IT. And as IoT, AI and automation becomes more pervasive, data processes that support these services, which will, no doubt run in the cloud, can impact public safety. Just think of what would happen in the future if an outage occurred in a major cloud provider running the backend for self-guided car algorithms during rush hour.

If the public cloud is to remain (at this point almost inevitable) then the safety and continuous functioning of this infrastructure becomes a public concern. As such, having a National Public IT Safety Board seems like the only way to have some entity own IT’s increased fragility due to public cloud infrastructure consolidation.

~~~~

In the meantime, as corporations, government and other entities contemplate migrating data services to the cloud, they should consider the broader impact they are having on the reliability of public IT. When public cloud outages occur, all organizations suffer from the reduced public perception of IT service reliability.

Photo Credits: Fragile by Bart Everson; Fragile Planet by Dave Ginsberg; Strange Clouds by Michael Roper

A tale of two AFAs: EMC DSSD D5 & Pure Storage FlashBlade

Posted on March 17, 2016May 17, 2016 by Ray in Block Storage, data protection, Data reduction, data services, Ethernet, File Storage, PCIe, SSD storage

There’s been an ongoing debate in the analyst community about the advantages of software only innovation vs. hardware-software innovation (see Commodity hardware loses again and Commodity hardware always loses posts). Here is another example where two separate companies have turned to hardware innovation to take storage innovation to the next level.

DSSD D5 and FlashBlade

Within the last couple of weeks, two radically different AFAs were introduced. One by perennial heavyweight EMC with their new DSSD D5 rack scale flash system and the other by relatively new comer Pure Storage with their new FlashBlade storage system.

These two arrays seem to be going after opposite ends of the storage market: the 5U DSSD D5 is going after both structured and unstructured data that needs ultra high speed IO access (<100µsec) times and the 4U FlashBlade going after more general purpose unstructured data. And yet the two have have many similarities at least superficially.
Continue reading “A tale of two AFAs: EMC DSSD D5 & Pure Storage FlashBlade” →