How to Securely Bridge On-premise and Cloud-based Storage Services

05.07.2017

Examining the four ways you can manage migration to or synchronization with the cloud

Cloud StorageCloud storage revenue is forecast to grow more than 28% annually to reach $65 billion in 2020.  The driving force is the substantial economies of scale that enable cloud-based solutions to deliver more cost-effective primary and backup storage than on-premises systems can ever hope to achieve.

Most IT departments quickly discover, however, that there are significant challenges involved in migrating and synchronizing many thousands or even millions of files from on-premise storage systems to what Gartner characterizes as Enterprise File Synchronization and Sharing (EFSS) services in the cloud. According to Gartner, “by 2019 75% of enterprises will have deployed multiple EFSS capabilities, and over 50% … will struggle with problems of data migration, up from 10% today.”

In a report titled, “How to Migrate File Shares, SaaS and ECM to EFSS” Gartner identifies four ways organizations can manage migration to and/or synchronization with EFSS services—custom integration; rudimentary copy; EFSS import services; and specialized third-party tools—which we’ll explore more here:

Custom integration

Custom solutions can be handled internally by IT or outsourced to consultants with expertise in content management. Either way, the question remains: Is an “integration army” required? The answer depends on how similar or different the storage systems are, and in most situations, the “troops” find the system differences to be both broader and deeper than initially anticipated.

Every file has a unique set of properties associated with it, and most file systems treat at least some of these file properties differently. The properties include the basics, such as file name, format and metadata, along with the more advanced, such as versioning, ownership preservation, and permissions.

In a hybrid storage environment, file names might need to be normalized. Versions might need to be tracked manually. Different security models might be needed for each file system, potentially creating problems for users—and placing a significant burden on the Help Desk. In any complex custom integration, there are bound to be mistakes. And the biggest problem in a hybrid storage environment is often an inability to detect file transfer corruption or version problems before they cause problems for the organization.

Even seemingly simple scenarios can grow enormously complex. Consider the experience of Shawmut Design and Construction, a construction management firm with offices throughout the U.S. The company uses BIM 360 software from Autodesk for construction management, and the ShareFile platform from Citrix for collaboration with the team in the field.

Change orders are common in construction projects, and using out-of-date information can cause costly mistakes. So the superintendent in charge of the project took great care to ensure that all of the files were accurately synchronized daily. Using the file management capabilities built into BIM 360 and ShareFile, the effort required three project managers—two full-time and one part-time. Every day, the staff compared the versions of the many files in both systems, copying the latest from one to the other as needed to keep everything in sync. If three people are needed to handle synchronization between just two file systems, it is not surprising that complexity can increase exponentially in an organization with a dozen or more.

Shawmut did not attempt to have IT resources automate the file synchronization task, but other companies have—normally with unsatisfactory results. Getting bi- or multi-directional file synchronization to work well is not a trivial endeavor. Indeed, successfully navigating the different “file logistics” of multiple incompatible storage systems can become a Tower of Babel that is fraught with potential peril. Making a mistake when comparing just one of the file’s properties involving the last accessed/modified date, user/group access permissions or locking can result in a file becoming corrupt or over-written by an older version. And if the custom integration application lacks robust error detection and reporting (something that is deceptively difficult), the mistake will remain undetected—until a user complains.

For a one-time migration or a one-way backup, a custom integration effort, consisting of a combination of manual and automated procedures, may work well enough. This is especially true if the differences among the storage systems involved are relatively minor and manageable.

But in most cases, the answer to the question asked in the title is: Yes, it will take an army to successfully and securely synchronize files in a hybrid storage environment. Fortunately, there are three alternatives to custom integration.

Rudimentary copy

Using familiar, proven and low-tech “brute force” bulk copy commands, such as xcopy in Windows/DOS and rsync in Linux, is certainly simple and, therefore, might seem to be fairly foolproof. Applications like the File Explorer in Windows and the file management applications offered with most EFSS services also provide bulk file and folder copying capabilities.

For brute force bulk copy to work well, though, the storage systems involved either need to be compatible or must be made interoperable at their “lowest common denominator.” For example, more lenient file naming conventions and more generous file size capabilities might need to be abandoned in order to accommodate the most restrictive storage system, but doing so will minimize the complexity involved. Unless all systems can be made fully interoperable, however, challenges are certain to remain, especially involving file locking and security context via properties like user and group permissions for read/write/delete access.

As with custom integration, rudimentary copy can work well for a one-time migration or as a one-way backup solution. But because basic bulk copy commands and utilities lack robust file comparison capabilities, this approach is risky as a file synchronization solution in a hybrid storage environment.

Import services

Various forms of import services are available with virtually all EFSS platforms. Each has its own file management application with an online file import function, and some providers recommend using a physical disk drive when importing more than 100GB of data.

While these online applications and services shift responsibility to the EFSS provider, they can suffer from the same potential complexities and/or limitations such as lost permission models and structures, user-defined metadata, file ownership, and versions as encountered in custom integrations and rudimentary copy mechanisms. So if the import service fails to adequately accommodate the underlying file property differences between or among the different storage systems, the results are destined to be less than satisfactory. And it is for this reason that EFSS providers—just like a growing number of enterprise IT departments—are starting to use purpose-built third-party file migration and synchronization tools.

Third-party tools

The growing popularity and inherent complexities of hybrid storage architectures have created a demand for specialized “middleware” software designed specifically to manage storage system migration and synchronization. While designs vary, the more advanced of these file logistics systems use a custom “connector” for each storage system supported. The connectors provide a common set of functionality that enables every storage system to interoperate with all others, without sacrificing the advanced capabilities of any. The result is a hybrid content management system capable of serving as an intelligent intermediary between or among many different storage systems.

To provide the agility desired in a hybrid storage environment, the connectors normally support a wide range of both on-premises storage systems (e.g. NFS/SAN/NAS, SharePoint, and various Enterprise Content Management solutions) and EFSS platforms (e.g. Box, Dropbox for Business, Google Drive Office 365, OneDrive ShareFile, and Syncplicity). The depth and breadth of support makes these tools suitable for supporting most enterprise applications, as well as the “shadow IT” Bring Your Own Storage (BYOS) environment being created as users increasingly migrate their own data to the cloud.

Increasing frustration with its manual synchronization motivated Shawmut to pilot a third-party hybrid content management tool, and the improvement was immediate. With connectors for both Shawmut’s on-premises storage system and Citrix ShareFile, the tool automatically synchronizes files every night based on just a few “point-and-click” instructions, which has eliminated the need for painstaking manual comparisons. Now the project superintendent spends only a few minutes at the end of each workday to set up the synchronization. After confirming the tool worked as desired, the three project managers previously responsible for synchronizing the files were reassigned to more productive tasks.

While security was not a major concern at Shawmut, it is at most organizations. To accommodate this important requirement, the connectors usually include support for each file system’s security provisions, and the tool itself is normally installed behind the enterprise firewall and other perimeter defenses.

The journey to deciding which of these four alternatives might be the best and most cost-effective in any particular situation begins with taking an inventory of all the storage systems being used enterprise-wide both on-premises and in the cloud. Gartner recommends using a file analysis tool capable of scanning each file system to index its contents and file attributes. With more powerful tools now becoming available to automate the migration and synchronization of on-premises and cloud-based storage services, IT departments no longer need to assign an integration army to the task.

Krystal Elliott
Krystal Elliott