Scality’s Open Source S3 Driver

img_6931
The view from Scality’s conference room

We were at Scality last week for Cloud Field Day 1 (CFD1) and one of the items they discussed was their open source S3 driver. (Videos available here).

Scality was on the 25th floor of a downtown San Francisco office tower. And the view outside the conference room was great. Giorgio Regni, CTO, Scality, said on the two days a year it wasn’t foggy out, you could even see Golden Gate Bridge from their conference room.

Scality

img_6912As you may recall, Scality is an object storage solution that came out of the telecom, consumer networking industry to provide Google/Facebook like storage services to other customers.

Scality RING is a software defined object storage that supports a full complement of interface legacy and advanced protocols including, NFS, CIGS/SMB, Linux FUSE, RESTful native, SWIFT, CDMI and Amazon Web Services (AWS) S3. Scality also supports replication and erasure coding based on object size.

RING 6.0 brings AWS IAM style authentication to Scality object storage. Scality pricing is based on usable storage and you bring your own hardware.

Giorgio also gave a session on the RING’s durability (reliability) which showed they support 13-9’s data availability. He flashed up the math on this but it was too fast for me to take down:)

Scality has been on the market since 2010 and has been having a lot of success lately, having grown 150% in revenue this past year. In the media and entertainment space, Scality has won a lot of business with their S3 support. But their other interface protocols are also very popular.

Why S3?

It looks as if AWS S3 is becoming the defacto standard for object storage. AWS S3 is the largest current repository of objects. As such, other vendors and solution providers now offer support for S3 services whenever they need an object/bulk storage tier behind their appliances/applications/solutions.

This has driven every object storage vendor to also offer S3 “compatible” services to entice these users to move to their object storage solution. In essence, the object storage industry, like it or not, is standardizing on S3 because everyone is using it.

But how can you tell if a vendor’s S3 solution is any good. You could always try it out to see if it worked properly with your S3 application, but that involves a lot of heavy lifting.

However, there is another way. Take an S3 Driver and run your application against that. Assuming your vendor supports all the functionality used in the S3 Driver, it should all work with the real object storage solution.

Open source S3 driver

img_6916Scality open sourced their S3 driver just to make this process easier. Now, one could just download their S3server driver (available from Scality’s GitHub) and start it up.

Scality’s S3 driver runs ontop of a Docker Engine so to run it on your desktop you would need to install Docker Toolbox for older Mac or Windows systems or run Docker for Mac or Docker for Windows for newer systems. (We also talked with Docker at CFD1).

img_6933Firing up the S3server on my Mac

I used Docker for Mac but I assume the terminal CLI is the same for both.Downloading and installing Docker for Mac was pretty straightforward.  Starting it up took just a double click on the Docker application, which generates a toolbar Docker icon. You do need to enter your login password to run Docker for Mac but once that was done, you have Docker running on your Mac.

Open up a terminal window and you have the full Docker CLI at your disposal. You can download the latest S3 Server from Scality’s Docker hub by executing  a pull command (docker pull scality/s3server), to fire it up, you need to define a new container (docker run -d –name s3server -p 8000:8000 scality/s3server) and then start it (docker start s3server).

It’s that simple to have a S3server running on your Mac. The toolbox approach for older Mac’s and PC’S is a bit more complicated but seems simple enough.

The data is stored in the container and persists until you stop/delete the container. However, there’s an option to store the data elsewhere as well.

I tried to use CyberDuck to load some objects into my Mac’s S3server but couldn’t get it to connect properly. I wrote up a ticket to the S3server community. It seemed to be talking to the right port, but maybe I needed to do an S3cmd to initialize the bucket first – I think.

[Update 2016Sep19: Turns out the S3 server getting started doc said you should download an S3 profile for Cyberduck. I didn’t do that originally because I had already been using S3 with Cyberduck. But did that just now and it now works just like it’s supposed to. My mistake]

~~~~

Anyways, it all seemed pretty straight forward to run S3server on my Mac. If I was an application developer, it would make a lot of sense to try S3 this way before I did anything on the real AWS S3. And some day, when I grew tired of paying AWS, I could always migrate to Scality RING S3 object storage – or at least that’s the idea.

Comments?

TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).
Continue reading “TPU and hardware vs. software innovation (round 3)”

Intel Cloud Day 2016 news and views

 A couple of weeks back I was at Intel Cloud Day 2016 with the rest of the TFD team. We listened to a number of presentations from Intel Management team mostly about how the IT world was changing and how they planned to help lead the transition to the new cloud world.

The view from Intel is that any organization with 1200 to 1500 servers has enough scale to do a private cloud deployment that would be more economical than using public cloud services. Intel’s new goal is to facilitate (private) 10,000 clouds, being deployed across the world.

In order to facilitate the next 10,000, Intel is working hard to introduce a number of new technologies and programs that they feel can make it happen. One that was discussed at the show was the new OpenStack scheduler based on Google’s open sourced, Kubernetes technologies which provides container management for Google’s own infrastructure but now supports the OpenStack framework.

Another way Intel is helping is by building a new 1000 (500 now) server cloud test lab in San Antonio, TX. Of course the servers will be use the latest Xeon chips from Intel (see below for more info on the latest chips). The other enabling technology discussed a lot at the show was software defined infrastructure (SDI) which applies across the data center, networking and storage.

According to Intel, security isn’t the number 1 concern holding back cloud deployments anymore. Nowadays it’s more the lack of skills that’s governing how quickly the enterprise moves to the cloud.

At the event, Intel talked about a couple of verticals that seemed to be ahead of the pack in adopting cloud services, namely, education and healthcare.  They also spent a lot of time talking about the new technologies they were introducing today.
Continue reading “Intel Cloud Day 2016 news and views”

Platform9, a whole new way to run OpenStack

logo-long2
At TechFieldDay 10 (TFD10), in Austin this past week we had a presentation from Platform9‘s Shirish Raghuram Co-founder and CEO and Bich Le, Co-founder and Chief Architect. Both Shirish and Bich seemed to have come from having  worked a long time at VMware and prior to that, other tech giants.

Platform9 provides a user friendly approach to running OpenStack in your data center. Their solution is a SaaS based, management portal or control plane for running compute, storage and networking infrastructure under OpenStack, the open source cloud software.

Importing running virtualization environments

If you have a current, running VMware vSphere environment, you can onboard or import portions of or all of your VMs, datastores, NSX nodes, and the rest of the vSphere cluster and have them all come up as OpenStack core compute instances, Cinder storage volumes, and use NSX as a replacement for Neutron networking nodes.

In this case, once your vSphere environment is imported, users can fire up more compute instances, terminate ones they have, allocate more Cinder volumes, etc. all from an AWS-like management portal.  It’s as close to an AWS console as I have seen.

Platform9 also works for KVM environments, that is you can import currently running KVM environments into OpenStack and run them from their portal.

Makes OpenStack, almost easy to run/use/operate

Historically, the problem with OpenStack was its user interface. Platform9 solves this problem and makes it easy to import, use, and deploy VMware and KVM environments into an OpenStack framework. Once there, users and administrators have the same level of control that AWS and Microsoft Azure users have, i.e., fire up compute instances, allocate storage volumes and attach the two together, terminate the compute activities, detach the volumes and repeat, all in your very own private cloud.

Bare metal OpenStack support too

If you don’t have a current KVM or VMware environment, Platform9 will deploy a KVM virtualization environment on bare metal servers and storage and use that for your OpenStack cloud.

Security comes from tenant attributes, certain tenants have access and control over certain compute/storage/networking instances.

Customers can also use Platform9 as a replacement for vCenter, and once deployed under OpenStack, tenants/users have control over their segments of the private cloud deployment.

It handles multiple vSphere & KVM clusters as well and can also handle mixed virtualization environments within the same OpenStack cloud.

A few things missing

The only things I found missing from the Platform9 solution was Swift Object storage support and support for Hyper-V environments.

The Platform9 team mentioned that multi-region support was scheduled to come out this week, so then your users could fire up compute and storage instances across your world wide data centers, all from a single Platform9 management portal.

Pricing for the Platform9 service is on a socket basis, with volume pricing available for larger organizations.

If you are interested in a private cloud and are considering  OpenStack in order to avoid vendor lock-in, I would find it hard not to give Platform9 a try.

While at Dell


Later in the week, at TFD10 we talked with Dell, and they showed off their new VRTX Server product. Dell’s VRTX server is a very quiet, 4-server, 48TB tower or rackmount enclosure, which would make a very nice 8 or 16 socket CPU, private cloud for my home office environment (the picture doesn’t do it justice). And with a Platform9 control plane, I could offer OpenStack cloud services out of my home office, to all my neighbors around the world, for a fair but high price…

Comments?

 

Just in time for Xmas, Amazon uses >30K robots for fulfillment

Picker4And you thought Santa needed helpers. A recent ArsTechnica article (Sprawling? Pssht–no one streamlines … like Amazon) I read indicated that Amazon’s 13 USA fulfillment centers deploy over 30,000 robots of one type or another.

A video on an 8th generation fulfillment center in another story (Amazon unveils its 8th Gen fulfillment center) is pretty amazing. There are these Kiva bots which pull up underneath a rack stand which is full of products, pushes up and then moves the whole stand to where a person is ready to pick out products for various orders. The products are put into a bin, the bins are shipped down a conveyer belt to some one who packs them into cardboard and sends them off on another conveyer belt for shipping.

Come on everybody, do the conga…

The Kiva robots looked like a conga line, going every which way with a rack of product shelves on top of them. The only other robots was a pallet handling robot in the video that lifted a pallet of packages up to a 2nd-3rd story floor for shipment.

business-21755_1920All this has certainly come a long way since I was a manufacturing company shipping clerk. Where I worked there was a conveyer belt that delivered materials that had to be picked and placed into handmade cardboard boxes and then placed on pallets which were then moved by people driving electronic fork trucks to inventory or shipping docks. But we really only had to package one product at a time and the conveyer belts (or manufacturing line) would be reconfigured for every new product that needed to be shipped.

Fulfillment centers evolving

In the book The Everything Store, about the rise of Amazon, there was a description what must have been Gen 1 of their fulfillment centers. These centers contained old door panels on saw horses as tables, with shelves upon shelves, packed with books. People were running around, rummaging through the shelves to fulfill orders in their hands.

Somewhere in the middle of this evolution, Gen 4 perhaps, people were getting burded out. There was talk of people being so stressed out fulfilling orders at one center (Allentown, PA, see the Amazon Effect) that they were falling sick and that ambulances were stationed at the fulfillment center parking lots waiting for people to get sick–must have been Christmas rush.

The Gen 8 center in the video was nothing like this. If anything the people were relaxed and stayed at one place all the time, while the Kiva conga line fed them shelves of products to pick from. In the video I hardly saw any movement whatsoever other than Kiva robots and their rack loads, bins&packages on conveyer belts, or pallets of material being lifted/moved by a robot.

Fulfillment as an AWS service?

The ArsTechnica article talked more about how all AWS services are typically deployed and debugged for in house uses long before they get out to AWS customers at large. I don’t see Amazon offering fulfillment center logistics services yet but maybe I am missing something.

Then again, with Gen 9 fulfillment center, coming next year to Seattle, possibly they aren’t through tweaking it yet. No doubt a lot of AWS services are being burned to keep the fulfillment centers, literally “rolling” along.

Comments?

Photo Credits: Businesswire.com and Pixabay.com

 

 

 

Facebook down to 1.08 PUE and counting for cold storage

prineville-servers-470Read a recent article in ArsTechnica about Facebook’s cold storage archive and their sustainable data centers (How Facebook puts petabytes of old cat pix on ice in the name of sustainability). In the article there was a statement that Facebook had achieved a 1.08 PUE (Power Usage Effectiveness) for one of these data centers. This means for every 100 Watts used to power up racks, Facebook needed to add 8 Watts for other overhead.

Just last year I wrote a paper for a client where I interviewed the CEO of an outsourced data center provider (DuPont Fabros Technology) whose state of the art new data centers were achieving a PUE of from 1.14 to 1.18. For Facebook to run their cold storage data centers at 1.08 PUE is even better.

At the moment, Facebook has two cold storage data centers one at Prineville, OR and the other at Forest City, NC (Forest City achieved the 1.08 PUE). The two cold data storage sites add to the other Facebook data centers that handle everything else in the Facebook universe.

MAID to the rescue

First off these are just cold storage data centers, over an EB of data, but still archive storage, racks and racks of it. How they decide something is cold or hot seems to depend on last use. For example, if a picture has been referenced recently then it’s warm, if not then it’s cold.

Second, they have taken MAID (massive array of idle disks) to a whole new data center level. That is each 1U (Knox storage tray) shelf has 30 4TB drives and a rack has 16 of these storage trays, holding 1.92PB of data. At any one time, only one drive in each storage tray is powered up at a time. The racks have dual servers and only one power shelf (due to the reduced power requirements).

They also use pre-fetch hints provided by the Facebook application to cache user data.  This means they will fetch some images ahead of time,when users areis paging through photos in stream in order to have them in cache when needed. After the user looks at or passes up a photo, it is jettisoned from cache, the next photo is pre-fetched. When the disks are no longer busy, they are powered down.

Less power conversions lower PUE

Another thing Facebook is doing is reducing the number of power conversions that need to happen to power racks. In a typical data center power comes in at 480 Volts AC,  flows through the data center UPS and then is dropped down to 208 Volts AC at the PDU which flows to the rack power supply which is then converted to 12 Volts DC.  Each conversion of electricity generally sucks up power and in the end only 85% of the energy coming in reaches the rack’s servers and storage.

In Facebooks data centers, 480 Volts AC is channeled directly to the racks which have an in rack battery backup/UPS and rack’s power bus converts the 480 Volt AC to 12 Volt DC or AC directly as needed. By cutting out the data center level UPS and the PDU energy conversion they save lots of energy overhead which can be used to better power the racks.

Free air cooling helps

Facebook data centers like Prineville also make use of “fresh air cooling” that mixes data center air with outside air, that flows through through “wetted media” to cool which is then sent down to cool the racks by convection.  This process keeps the rack servers and storage within the proper temperature range but probably run hotter than most data centers this way. How much fresh air is brought in depends on outside temperature, but during most months, it works very well.

This is in contrast to standard data centers that use chillers, fans and pumps to keep the data center air moving, conditioned and cold enough to chill the equipment. All those fans, pumps and chillers can consume a lot of energy.

Renewable energy, too

Lately, Facebook has made obtaining renewable energy to power their data centers a high priority. One new data center close to the Arctic Circle was built there because of hydro-power, another in Iowa and one in Texas were built in locations with wind power.

All of this technology, open sourced

Facebook has open sourced all of it’s hardware and data center systems. That is the specifications for all the hardware discussed above and more is available from the Open Compute Organization, including the storage specification(s), open rack specification(s) and data center specification(s) for these data centers.

So if you want to build your own cold storage archive that can achieve 1.08 PUE, just pick up their specs and have at it.

Comments?

Picture Credits: DataCenterKnowledge.Com

 

At Scale conference keynote, Facebook video experience re-engineered

11990439_1644273839179047_2244380699715442158_nThe At Scale conference happened this past week in LA. Jay Parikh, Global Head of Engineering and Infrastructure at Facebook, kicked off the conference by talking about how Facebook is attempting to conquer some of it’s intrinsic problems, as it scales up from over 1B users today. I was unable to attend the conference but watched a video of the keynote (on Facebook of course).

The At Scale community is a group of large, hyper-scale, web companies such as Google, Microsoft, Twitter, and of course Facebook, among a gaggle of others that all have problems trying to scale up their infrastructure to handle more and more users activities. They had 1800 people registered for the At Scale 2015 conference on Monday, double last years count. The At Scale community are trying to push the innovation level of the industry faster, through a community of companies that need to work at hyper-scale.

Facebook’s video problem

At Facebook the current hot problem that’s impacting customer satisfaction seems to be video uploads and playback (downloads). The issues with Facebook’s video experience are multifaceted and range from the time it takes to successfully upload a video, to the bandwidth it takes to playback a video to the -system requirements to support live streaming video to 100,000s of users.

Facebook started as a text only service, migrated to a photo oriented service, but now is quickly moving to a video oriented user experience. But it doesn’t stop there they can see on the horizon that augmented and virtual reality will become a significant driver of activity for Facebook uses?!

Daily video 1B last year now at 4B video views/day. They also launched a new service lately, LiveMentions, which was a live streaming service for celebrities (real time video streams). Several celebrities were live streaming to 150K of their subscribers. So video has become and will continue as the main consumer of bandwidth at Facebook.

Struggling to enhance the Facebook user’s video experience over the past year, they have come up with three key engineering principles that have helped them: Planning, Iteration and Performance.

Planning

Facebook is already operating a terabit scale network, so doing something to its network wrong is going to cause major problems, around the world. As a result, Facebook engineering focused early on, into incorporating lots of instrumentation in their network and infrastructure services. This has allowed them to constantly monitor the activity of their users across their infrastructure to identify problems and solutions.

One metric Parikh talked about was “playback success rate”, this is the percentage where the video starts to play in under 1 second for a facebook user.  One chart he showed, was a playback success rate colored ove a world map  but aggregated (averaged) at the country level. But with their instrumentation Facebook was able to drill down to regions within a country and  even cities within a region. This allows engineering to identify problems at almost any level of granularity they need.

One key take away to Planing, is if you have the instrumentation in place, have people to monitor and mine the data and are willing to address the problems that crop up, then you can create a more flexible, efficient and effective environment and build a better product for your users.

Iteration

Iteration is not just about feature deployment, but it’s also about the Facebook user experience. Their instrumentation had told them that they were doing ok on video uploads but it turns out that when they looked at the details, they saw that some customers were not having a satisfactory video upload experience. For instance, one Facebook engineer had to wait 82 hours to upload a video.

The Facebook world is populated with 10s of thousands of unique devices with different memory, compute and storage. They had to devise approaches that could optimize the encoding for all the different devices, some of which was done on mobile phones.

They also had to try to optimize the network stack for different devices and mobile networking technologies. Parikh had another map showing network connectivity. Surprise, most of the world is not on LTE, and a vast majority of world is on 2G and 3G cellular networks. So via iteration Facebook went about improving video upload by 1% here and 1% there, but with Facebook’s user base, these improvements impact millions of users. They used cross functional teams to address the problems they uncovered.

However, video uploads problems were not just in device and connectivity realms. Turns out they had a big cancel upload button on their screen after the start of the video upload. This was sometimes clicked by mistake and they found that almost 10% of users hit the cancel upload. So they went through and re-examined the whole user experience to try to eliminate other hindrances to successful video uploads.

Performance

The key take away from this segment of the talk was that performance has to be considered from the get go of a new service or service upgrade. It is impossible to improve performance after the fact, especially for At Scale environments.

In my CS classes, the view was make it work and then make it work fast.  What Facebook has found is that you never have the time after a product has shipped to make it fast. As soon as it works, they had to move on to the next problem.

As a result if performance is not built in from the start, not a critical requirement/feature of a system architecture and design, it never gets addressed. Also if all you focus on is making it work then the design and all the code is built around feature functionality. Changing working functionality later to improve performance is an impossible task and typically represents a re-architecture/re-design/re-implementation of the functionality.

For instance, Facebook used to do video encoding in serial on a single server. It often took a long time (10 to 30 minutes). Engineering reimplemented their video encoding to partition the video and distribute the encoding across multiple servers. Doing this, sped up encoding time considerably.

But they didn’t stop there, with such a diverse user networking environment, they felt that they could save bandwidth and better optimize user playback if could reduce playback video size. They were able to take their machine learning/AI investments that Facebook has made and apply this to distributed video encoding. They were able to analyze the video scene by scene and opportunistically reduce bandwidth load and storage size but still maintain video  playback quality. By implementing the new video encoding process they have achieved double digit reductions in bandwidth requirements for playback.

Another example of the importance of performance was the LiveMentions feature discussed above. Celebrities often record streams in places with poor networking infrastructure. So in order to insure a good streaming experience Facebook  had to implement variable bit rate video upload to adjust upload bandwidth requirements based on networking environmentr. Moreover, once a celebrity starts a live stream all the fans in the world get notified. then there’s a thundering herd (boot storms anyone) to start watching the video stream. In order to support this mass streaming, Facebook implemented stream blocking, which holds off the start of a live stream viewing until they have cached enough of the video stream at their edge servers, worldwide. This guaranteed that all the fans had a good viewing experience, once it started.

There were a couple more videos of the show sessions but I didn’t have time to review them.  But Facebook sounds like a fun place to work, especially for infrastructure performance experts.

~~~~

Comments?

Flash’s only at 5% of data storage

7707062406_6508dba2a4_oWe have been hearing for years that NAND flash is at price parity with disk. But at this week’s Flash Memory Summit, Darren Thomas, VP Storage BU, Micron said at his keynote that NAND only store 5% of the bits in a data center.

Darren’s session was all about how to get flash to become more than 5% of data storage and called this “crossing the chasm”. I assume the 5% is against yearly data storage shipped.

Flash’s adoption rate

Darren, said last year flash climbed from 4% to 5% of data center storage, but he made no mention on whether flash’s adoption was accelerating. According to another of Darren’s charts, flash is expected to ship ~77B Gb of storage in 2015 and should grow to about 240B Gb by 2019.

If the ratio of flash bits shipped to data centers (vs. all flash bits shipped) holds constant then Flash should be ~15% of data storage by 2019. But this assumes data storage doesn’t grow. If we assume a 10% Y/Y CAGR for data storage, then flash would represent about ~9% of overall data storage.

Data growth at 10% could be conservative. A 2012 EE Times article said2010-2015 data growth CAGR would be 32%  and IDC’s 2012 digital universe report said that between 2012 and 2020, data will double every two years, a ~44% CAGR. But both numbers could be talking about the world’s data growth, not just data center.

How to cross this chasm?

Geoffrey Moore, author of Crossing the Chasm, came up on stage as Darren discussed what he thought it would take to go beyond early adopters (visionaries) to early majority (pragmatists) and reach wider flash adoption in data center storage. (See Wikipedia article for a summary on Crossing the Chasm.)

As one example of crossing the chasm, Darren talked about the electric light bulb. At introduction it competed against candles, oil lamps, gas lamps, etc. But it was the most expensive lighting system at the time.

But when people realized that electric lights could allow you to do stuff at night and not just go to sleep, adoption took off. At that time competitors to electric bulb did provide lighting it just wasn’t that good and in fact, most people went to bed to sleep at night because the light then available was so poor.

However, the electric bulb  higher performing lighting solution opened up the night to other activities.

What needs to change in NAND flash marketing?

From Darren’s perspective the problem with flash today is that marketing and sales of flash storage are all about speed, feeds and relative pricing against disk storage. But what’s needed is to discuss the disruptive benefits of flash/NAND storage that are impossible to achieve with disk today.

What are the disruptive benefits of NAND/flash storage,  unrealizable with disk today.

  1. Real time analytics and other RT applications;
  2. More responsive mobile and data center applications;
  3. Greener, quieter, and potentially denser data center;
  4. Storage for mobile, IoT and other ruggedized application environments.

Only the first three above apply  to data centers. And none seem as significant  as opening up the night, but maybe I am missing a few.

Also the Wikipedia article cited above states that a Crossing the Chasm approach works best for disruptive or discontinuous innovations and that more continuous innovations (doesn’t cause significant behavioral change) does better with Everett Roger’s standard diffusion of innovation approaches (see Wikepedia article for more).

So is NAND flash a disruptive or continuous innovation?  Darren seems firmly in the disruptive camp today.

Comments?

Photo Credit(s): 20-nanometer NAND flash chip, IntelFreePress’ photostream