Tag Archives: AWS S3

48: Greybeards talk object storage with Enrico Signoretti, Head of Product Strategy, OpenIO

In this episode we talk with Enrico Signoretti, Head of Product Strategy for OpenIO, a software defined, object storage startup out of Europe. Enrico is an old friend, having been a member of many Storage Field Day events (SFD) in the past which both Howard and I attended and we wanted to hear what he was up to nowadays.

OpenIO open source SDS

It turns out that OpenIO is an open source object storage project that’s been around since 2008 and has recently (2015) been re-launched as a new storage startup. The open source, community version is still available and OpenIO has links to downloads to try it out. There’s even one for a Raspberry PI (Raspbian 8, I believe) on their website.

As everyone should recall object storage is meant for multi-PB data storage environments. Objects are assigned an ID and are stored in containers or buckets. Object storage has a flat hierarchy unlike file systems that have a multi-tiered hierarchy.

Currently, OpenIO is in a number of customer sites running 15-20PB storage environments. OpenIO supports AWS S3 compatible protocol and OpenStack Swift object storage API.

OpenIO is based on open source but customer service and usability are built into the product they license to end customers  on a usable capacity basis. Minimum license is for 100TB and can go into the multiPB range. There doesn’t appear to be any charge for enhancements of additional features or additional cluster nodes.

The original code was developed for a big email service provider and supported a massive user community. So it was originally developed for small objects, with fast access and many cluster nodes. Nowadays, it can also support very large objects as well.

OpenIO functionality

Each disk device in the OpenIO cluster is a dedicated service. By setting it up this way,  load balancing across the cluster can be at the disk level. Load balancing in OpenIO, is also a dynamic operation. That is, every time a object is created all node’s current capacity is used to determine the node with the least used capacity, which is then allocated to hold that object. This way there’s no static allocation of object IDs to nodes.

Data protection in OpenIO supports erasure coding as well as mirroring (replication{. This can be set by policy and can vary depending on object size. For example, if an object is say under 100MB it can be replicated 3 times but if it’s over 100MB it uses erasure coding.

OpenIO supports hybrid tiering today. This means that an object can move from OpenIO residency to public cloud (AWS S3 or BackBlaze B2) residency over time if the customer wishes. In a future release they will support replication to public cloud as well as tiering.  Many larger customers don’t use tiering because of the expense. Enrico says S3 is cheap as long as you don’t access the data.

OpenIO provides compression of objects. Although many object storage customers already compress and encrypt their data so may not use this. For those customers who don’t, compression can often double the amount of effective storage.

Metadata is just another service in the OpenIO cluster. This means it can be assigned to a number of nodes or all nodes on a configuration basis. OpenIO keeps their metadata on SSDs, which are replicated for data protection rather than in memory. This allows OpenIO to have a light weight footprint. They call their solution “serverless” but what I take from that is that it doesn’t use a lot of server resources to run.

OpenIO offers a number of adjunct services besides pure object storage such as video transcoding or streaming that can be invoked automatically on objects.

They also offer stretched clusters where an OpenIO cluster exists across multiple locations. Objects can have dispersal-like erasure coding for multi-site environments so that if one site goes down you still have access to the data. But Enrico said you have to have a minimum of 3 sites for this.

Enrico mentioned one media & entertainment customer stored only one version of a video in the object storage but when requested in another format automatically transcoded it in realtime. They kept this newly transcoded version in a CDN for future availability, until it aged out.

There seems to be a lot of policy and procedural flexibility available with OpenIO but that may just be an artifact of running in Linux.

They currently support RedHat, Ubuntu and CentOS. They also have a Docker container in Beta test for persistent objects, which is expected to ship later this year.

OpenIO hardware requirements

OpenIO has minimal hardware requirements for cluster nodes. The only thing I saw on their website was the need for at least 2GB of RAM on each node.  And metadata services seem to require SSDs on multiple nodes.

As discussed above, OpenIO has a uniquely light weight footprint (which is why it can run on Raspberry PI) and only seems to need about 500MB of DRAM and 1 core to run effectively.

OpenIO supports heterogeneous nodes. That is nodes can have different numbers and types of disks/SSDs on them, different processor, memory configurations and OSs. We talked about the possibility of having a node go down or disks going down and operating without them for a month, at the end of which admins could go through and fix them/replacing them as needed. Enrico also mentioned it was very easy to add and decommission nodes.

OpenIO supports a nano-node, which is just an (ARM) CPU, ram and a disk drive. Sort of like Seagate Kinetic and other vendor Open Ethernet drive solutions. These drives have a lightweight processor with small memory running Linux accessing an attached disk drive.

Also, OpenIO nodes can offer different services. Some cluster nodes can offer metadata and object storage services and others only object storage services. This seems configurable on a server basis. There’s probably some minimum number of metadata and object services required in a cluster. Enrico mentioned three nodes as a minimum cluster.

The podcast runs ~42 minutes but Enrico is a very knowledgeable, industry expert and a great friend from multiple SFD/TFD events. Howard and I had fun talking with him again. Listen to the podcast to learn more.

Enrico Signoretti, Head of Product Strategy at OpenIO.

In his role as head of product strategy, Enrico is responsible for the planning design and execution of OpenIO product strategy. With the support of his team, he develops product roadmaps from the planning stages to development to ensure their market fit.

Enrico promotes OpenIO products and represent the company and its products at several industry events, conferences and association meetings across different geographies. He actively participates in the company’s sales effort with key accounts as well as by exploring opportunities for developing new partnerships and innovative channel activities.

Prior to joining OpenIO, Enrico worked as an independent IT analyst, blogger and advisor for six years, serving clients among primary storage vendors, startups and end users in Europe and the US.

Enrico is constantly keeping an eye on how the market evolves and continuously looking for new ideas and innovative solutions.

Enrico is also a great sailor and an unsuccessful fisherman.