A decades-old file system with a rich history can play a central role in the future of high-volume cloud computing.
Although it is an older technology, ZFS has the potential to provide excellent performance for very large and latency sensitive workloads. Running on AWS Graviton ARM processors, ZFS can support up to 12.5 GB per second (GB / s) throughput and up to 1 million IOPS for frequently accessed cached data, and 4 GB / s and up to 160,000 IOPS directly from persistent storage, all with less than a second latencies.
“It’s a wonderful filesystem, with enormous capacities,” enthused Wayne Duso, AWS vice president of engineering for storage, edge, and data governance services, in an interview with The New Stack.
This ZFS implementation, based on OpenZFS, is done through AWS FSx, an AWS-managed file system service created by AWS to adopt third-party file systems in its cloud environment. FSx for OpenZFS the volumes are accessible from Linux, MacOS and Windows clients via NFS protocols.
Sun Microsystems originally designed ZFS in the early 2000s, with the intention of making it the first file system with a 128-bit address size. Indeed, it can address 1.84 Ã 1019 time more data than 64-bit systems, enough index space to handle an almost unlimited amount of data.
As a parallel file system, ZFS can simultaneously serve thousands of clients, or just send a huge amount of data to a single client. It works best for sending many small files in parallel and as such would be useful for workloads like machine learning, EDA (Electronic Design Automation), media processing, financial analysis and other uses, according to AWS.
In the AWS environment, cloud clients served by FSx for ZFS include Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (EKS) clusters, Amazon WorkSpaces virtual desktops and VMware Cloud on AWS.
Despite its great promise, ZFS has had a hard time of it. Originally designed for Sun’s OpenSolaris Unix operating system, which fell out of favor in the early 2000s. Linus Torvalds a refused to wear it to the Linux kernel for licensing reasons. Apple tried to port it to the MacIntosh but halted its efforts after a few years.
Oracle acquired Sun Microsystems in 2010 and closed OpenSolaris shortly thereafter. Although it continued to run on ZFS, but updates were no longer open source, so the OpenZFS project was created to continue working on porting the file system to other platforms.
A major benefit of using ZFS as a service is that the end user doesn’t have to worry about deployment and management, which hasn’t always been easy with such a complex file system. Duso said. This was also true with Chandelier, another parallel file system, primarily aimed at the high performance computing market that AWS also has announced support for this week, via FSx. AWS also provides cloud support for Windows File Server and NetApp ONTAP, throughout FSx.
The idea with FSx is to âprovide our customers with the file systems they use today,â said Duso. âThey’ve built their workflows around these file systemsâ and it can be precarious to move data from one file system to another.
Big Data? “Zettabytes, equivalent to 1 billion exabytes,” will soon become commonplace in the lexicon of enterprise technology, “says @awscloud‘s @SwamiSivasubram. # reinvent2021 #keynote @thenewstack pic.twitter.com/DmEfBZqoj6
– BC Gain (@bcamerongain) December 1, 2021
âCustomers said they liked the capabilities, but they didn’t want to allocate staff and time manager on ZFS,â Duso said. As a managed service, ZFS can be easily deployed.
ZFS also offers built-in near real-time snapshot capabilities, allowing users to restore previous versions of files. FSx itself also performs daily file system backups to Amazon S3. Each OpenZFS file system can contain multiple volumes, and each can be managed through separate quotas for attributes such as volume storage, per user storage, and group storage.
AWS charges users for file system usage based on storage capacity (per GB-month), SSD IOPS (per IOPS-month), and throughput capacity (per MBps-month).
More storage news
The ZFS news was one of many storage announcements made at the conference, which is being held this week in Las Vegas.
The company also launched a new ‘snapshot’ level for its Glacier long-term archive storage, optimized for cases where the storage object is accessed more often than rarely accessed hardware in traditional Glacier storage, but not enough to justify the expense of live storage. This option would be ideal for documents viewed about four times a year (like financial data, Duso observed).
A new archive of EBS snapshots was revealed for customers who need to keep their volume snapshots longer than the usual retention period. This approach can save customers up to 75% of the cost of maintaining Amazon EBS snapshots for months or years.
The idea with all of these new service offerings is to help customers get the most out of their data and at the most cost effective price, Duso said. Being able to derive value from data requires that the user can interact with the data in a transparent manner.
âYou don’t need to create tools to move data,â Duso said. âAll you have to do is call an API to move this data to where it needs to be,â he said.
So instead of seeing ZFS as a legacy technology, it might be more accurate to say that it is a decade ahead.
Inset image: Wayne Duso.