Data security rules have changed in the age of big data, 5 questions to identify exposure of the enterprise to greater risk and damage
Mo Data stashed this in Big Data Ethics and Privacy
http://www.ibmbigdatahub.com/blog/addressing-big-data-security
According to ISACA’s white paper “Privacy and Big Data,” published in August 2013, enterprises must ask and answer 16 important questions, including these five key questions, which, if ignored, expose the enterprise to greater risk and damage:
- Can the company trust its sources of big data?
- What information is the company collecting without exposing the enterprise to legal and regulatory battles?
- How will the company protect its sources, processes and decisions from theft and corruption?
- What policies are in place to ensure that employees keep stakeholder information confidential during and after employment?
- What actions is the company taking that create trends that can be exploited by its rivals?
Hadoop, like many open source technologies such as UNIX and TCP/IP, wasn’t originally built with the enterprise in mind, let alone enterprise security. Hadoop’s original purpose was to manage publicly available information such as web links, and it was designed to format large amounts of unstructured data within a distributed computing environment, specifically Google’s. It was not written to support hardened security, compliance, encryption, policy enablement and risk management.
Here are some specific steps you can take to secure your big data:
- Use Kerberos authentication for validating inter-service communication and to validate application requests for MapReduce (MR) and similar functions.
- Use file/OS layer encryption to protect data at rest, ensure administrators or other applications cannot gain direct access to files, and prevent leaked information from exposure. File encryption protects against two attacker techniques for circumventing application security controls. Encryption protects data if malicious users or administrators gain access to data nodes and directly inspect files, and renders stolen files or copied disk images unreadable.
- Use key/certificate management to store your encryption keys safely and separately from the data you’re trying to protect.
- Use Automation tools like Chef and Puppet to help you validate nodes during deployment and stay on top of patching, application configuration, updating the Hadoop stack, collecting trusted machine images, certificates and platform discrepancies.
- Create/ use log transactions, anomalies and administrative activity to validate usage and provide forensic system logs.
- Use SSL or TLS network security to authenticate and ensure privacy of communications between nodes, name servers, and applications. Implement secure communication between nodes, and between nodes and applications. This requires an SSL/TLS implementation that actually protects all network communications, rather than just a subset.
- Anonymize data to remove all data that can be uniquely tied to an individual. Although this technique can protect some personal identification, hence privacy, you need to be really careful about the amount of information you strip out.
- Use Tokenization techniques to protect sensitive data by replacing it with random tokens or alias values that mean nothing to someone who gains unauthorized access to this data.
- Leverage the Cloud database controls where access controls are built into the database to protect the whole database.
- Use OS Hardening - the operating system on which the data is processed to harden and lock down data. The four main protection focus areas should be: users, permissions, services and logging.
- Use In-Line Remediation to update configuration, restrict applications and devices, restrict network access in response to non-compliance.
- Use the Knox Gateway (“Gateway” or “Knox”) that provides a single point of authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster).
Stashed in: Risk!
8:34 AM Nov 17 2013