Hadoop is an immature technology and not originally designed with security in mind
Mo Data stashed this in Big Data Technologies
[Interview]: John Armstrong On Hadoop Security Challenges Faced By Enterprises And Their Solutions
Zettaset, Big Data security provider, automates, accelerates, and simplifies Hadoop cluster deployment for the enterprise. The company’s enterprise software is transparent and compatible with open source Hadoop distributions, and augments them by providing the capabilities that enterprises expect and need for their critical data center deployments.
The installation and management of Hadoop cluster is simplified with Zettaset. John Armstrong, VP of Marketing, in an interview to toolsjournal, mentioned about challenges faced by enterprises adopting big data solutions, Hadoop cluster security aspects and the future roadmap for Zettaset.
ToolsJournal: What are the three challenges you see enterprises adopting Hadoop based big data solutions?
Hadoop is an immature technology. Available open source tools offer a limited and fragmented approach to data security and protection. For example, encryption can only be accomplished by installation of an additional “bolt-on” product that is not integrated with Hadoop.
Lack of software automation, resulting in time-consuming manual processes to install and configure Hadoop. Instead, as part of their business model, branded Hadoop distribution vendors rely too heavily on professional services to deploy Hadoop. This business model does not support replicable processes and does not scale in the enterprise.
No assured reliability for Hadoop services in the database. Open source approaches typically only provide automated failover for NameNode and JobTracker, leaving other critical services such as Kerberos, Oozie, Zookeepr, Hive, etc. un-supported. Failure of any of these services can cause Hadoop cluster disruption or complete outage.
ToolsJournal: What are those points in the Hadoop architecture that lacks secure protocols due to which you had to launch Zettaset?
John Armstrong: Hadoop was not originally designed with security in mind. It was created to be an inexpensive and scalable way to store massive numbers of public URLs (Google).
Fine-grained role-based access control: not intrinsic to Hadoop
Security policy enforcement (i.e., active directory (AD) integration: not intrinsic to Hadoop
Data-at-rest encryption: not intrinsic to Hadoop
High availability across all services: not intrinsic to Hadoop
Automated installation of Hadoop services with graphical UI: not intrinsic to Hadoop
Automated and secure connectivity between Hadoop and analytics applications: not intrinsic to Hadoop
ToolsJournal: Zettaset Orchestrator dubbed ZSO, how is it unique securing big data in an enterprise? Why is it required?
John Armstrong:Zettaset Orchestrator is required because it addresses the gaps in data security, service availability, and ease of use called out in Questions #1 and #2.
Zettaset Orchestrator is unique because it is the first fully-integrated application/solution for Hadoop security and cluster management, extending security well beyond the limited capabilities of the existing open-source model. It makes extensive use of automated processes to improve enterprise productivity and eliminate unnecessary reliance on professional services.
Although it is designed to work with open source Hadoop, Zettaset Orchestrator is a commercial software product with patented and patent-pending intellectual property. Orchestrator is fully-compatible with Hadoop distributions from Cloudera and Hortonworks. It can be installed alongside these distributions as a more feature-rich replacement for Cloudera Manager and Hortonworks Ambari.
ToolsJournal: What exactly does your recent patent “High Availability patent for Hadoop” cover and how does this help Zettaset to stay above the curve compared to others?
John Armstrong: Hadoop is not a single software application: It consists of multiple applications or “projects”. Branded Hadoop distributions such as Cloudera CDH and Hortonworks HDP typically only provide high availability for a handful of services such as NameNode and JobTracker, leaving many other critical Hadoop services subject to failure, and reducing the overall reliability of the Hadoop cluster.
For enterprises that wish to deploy Hadoop in production environments, this situation is unacceptable. That’s why Zettaset created Multi-service High Availability. It ensures that each and every Hadoop service is protected by automated failover, making the Hadoop cluster more reliable in application environments that require consistent up-time. No other product exists that matches this capability.
ToolsJournal: What three security recommendations would you give to enterprises trying to adopt Hadoop?
Start small, and familiarize yourself with Hadoop and your particular security requirements. If you are running a small pilot project, security may not be an immediate concern and open source distributions might be acceptable for the short-term, but you will need to look at more comprehensive security applications such as Zettaset Orchestrator before you migrate to production deployments.
If you are going to scale up into a production deployment and require support for multi-tenancy, make sure that your security solution has automated AD/LDAP integration, and fine-grained role-based access control. Otherwise you will spend days configuring roles and permissions for Active Directory. And you may not be able to support all of the roles in Hadoop that are necessary in your computing environment.
A Hadoop database isn’t of much value until to connect it to an analytics application for data mining and analysis. Make sure that the security solution you select enables you to easily extend security protection and security policies (as well as high availability) from Hadoop into the analytics layer. Otherwise, this could become a major hurdle to broader adoption in your enterprise.
ToolsJournal: Finally, help us understand two themes you are working on as part of your future product roadmap.
As Hadoop databases are extended into the cloud, database security and reliability will be of even greater importance for service providers as well as enterprise customers. We continue to explore ways to optimize the security and performance of Hadoop deployments in the cloud.
We have other roadmap initiatives that we are not ready to share at this time…but will update you at the appropriate time.
About John Armstrong
John brings 30 years of enterprise-focused marketing experience. Before joining Zettaset, John provided contract marketing, consulting, and interim management expertise to a diverse set of clients including Cisco, Juniper, LeadFormix, Mobile Money, NetScaler, NetScout, Nokia, Wyse, and many others. Prior to his consulting career, John spent nearly four years as a vice president and chief networking analyst at Gartner. He has also held key marketing positions at pioneering networking companies, including Yipes Communications, Madge/Lannet, and SynOptics/Bay Networks. John holds an MA in Communications Management from the Annenberg School at the University of Southern California.
With most new technologies, security comes later.
Why waste resources making something secure when you don't know if it will even be used?
Yep agree on that. But being aware of that is super important. What we have now is large enterprises who are used to implementing Oracle and IBM - just expecting layers of features in the products. Hadoop and many of the open source components to the big data stack don't have that - and that is where they end up surprised.
Then it's a good thing you write this! Hadoop itself is still quite young, right?
It's not even 10 year old technology yet.