Quantcast
Channel: Hortonworks
Viewing all 14 articles
Browse latest View live

Hortonworks Offers Holistic and Comprehensive Security for Hadoop

$
0
0

It has been an exciting  few weeks for the XA Secure team. We formally joined Hortonworks on 5/15 and have received a warm  welcome from our new peers. Even more exciting are the numerous discussions we have had with current customers and prospects on how we can bring together a comprehensive and holistic security capabilities to HDP.  We now begin the journey to incubate our XA Secure functionality as a completely open source project governed by the Apache Software Foundation.

With all the excitement of the acquisition behind us, let’s take a deeper look at the road ahead as we work to further define the definition of completely open source security for Hadoop.  We also encourage you to register for our webinar next Tuesday where we will outline these new features.

Holistic security approach: Why is it important in Hadoop?

With the introduction of YARN in HDP 2.0, Hadoop now extends beyond it’s batch roots and has moved into the mainstream of daily data processing and analytics with real-time, online, interactive, and batch applications.  Hadoop is more mission ciritcal than ever and with this comes stringent enterprise requirements, especially for security.

The traditional security controls, which worked well for silo-ed application and data storage marts, are no longer adequate in the land of data lakes. Some of the challenges include:

  • Data, which was once well protected within singular environments, now suddenly co-exists with multiple sets of data and business processes 
  • Within a data lake, access to the cluster may be provided for many, however we must still protect access to singular data sets or subsets of data within the overall system.
  • A lake also allows multiple applications to access and work on a single set of data (two apps on a single operational store or stores).  Walls must be constructed to protect the data from misuse between applications.

The answers to these challenges cannot be solved by solving each individual issue.  A comprehensive approach is needed and ultimately any security solution must address this key question: How do we secure Hadoop while keeping its open architecture and scalability to running any use case the enterprise needs?

Together, Hortonworks and XA Secure share a belief to provide a comprehensive, holistic security approach that can answer this question. Together, we now will execute on this vision, completely in the open.

What does comprehensive security mean for Hadoop?

secblog01aA Chief Security Officer’s (CSO) or Director of IT Security demands not only holistic coverage for security but also that it is easy to administer and enforce consistent policy across all data no matter where it is stored and processed.  

Apache Hadoop originated as a simple project at the Apache Software Foundation (ASF) to manage and access data, and included just two components: the Hadoop Distributed File System (HDFS); and MapReduce, a processing framework for data stored in HDFS. As Hadoop matured so too did the business cases built around it and new processing and access engines were required for not only batch, but interactive, real-time and streaming use cases.  

A comprehensive and integrated framework for securing data irrespective of how the data is stored and accessed is needed.  Enterprises may adopt any use case (batch, real time, interactive), but data should be secured through the same standards, and security should be administered centrally and in one place.

This belief is the foundation of HDP Security and we endeavor to continually develop and execute on our security roadmap with this vision—of looking top down at the entire Hadoop data platform.

HDP Security, now with XA Secure

Hortonworks Data Platform provides a centralized approach to security management which allows you to define and deploy consistent security policy throughout the data platform.  With the addition of XA Secure, It allows you to easily create and administer a central security policy while coordinating consistent enforcement across all Hadoop applications. 

Screen Shot 2014-05-14 at 4.51.23 PM

HDP provides a comprehensive set of critical features for authentication, authorization, audit and data protection so that you can apply enterprise grade security to your Hadoop deployment. We continue to look at security within these four pillars.

Authentication
Authentication is the first step in security process, ensuring that user is who he or she claims to be. We continue to provide the native authentication capabilities through simple authentication or through Kerberos, already available in the HDP platform. Apache Knox provides a single point of authentication/access for your cluster and integrates with your existing LDAP or Active Directory implementations.  

Authorization
Authorization or entitlement is the process of ensuring users have access only to data as per their corporate policies. Hadoop already provides fine-grained authorization via file permissions in HDFS, resource-level access control for YARN and MapReduce, and coarser-grained access control at a service level. HBase provides authorization with ACL on tables and column families, while Accumulo extends this further to cell-level control. Apache Hive provides Grant/Revoke access control on tables. 

Within the addition of XA Secure, Hadoop now add authorization features that  help enterprises securely use varied data with multiple user groups while ensuring proper entitlements. It provide an intuitive way for users to specify entitlements policies for HDFS, HBase, and Hive with a centralized administration interface and extended authorization enforcement. Our goal is to provide a common authorization framework for the HDP platform, providing security administrators with a single administrative console to manage all the authorization policies for HDP components.

Audit & Accountability
One of the cornerstone for any security system is accountability, or having audit data for auditors to control the system and check for regulatory compliance, for example in Healthcare around HIPAA compliance. Healthcare providers would look within audit data for access history for sensitive data such as patient records, and provide the data if requested by patient or any regulatory authority.  Having a robust audit data would help enterprises manage their regulatory compliance needs better as well as control the environment proactively. XA Secure provides a centralized framework for collecting access audit history and easy reporting on the data. The data can be filtered based on various parameters. Our goal is to enhance the audit information that is captured within various components within Hadoop and provide insights through the centralized reporting.

Data Protection
Data protection involves protecting data at rest and in motion. Data protection includes encryption and masking. Encryption provides an added layer of security by protecting data when it is transferred and when it is stored (at rest), while masking capabilities enable security administrators to desensitize PII for display or temporary storage. We will continue to leverage the existing capabilities in HDP for encrypting data in flight, while bringing forward partner solutions for encrypting data at rest, data discovery, and data masking.

…and Central Administration
Central Administration of a consistent security policy across all Hadoop data and access methods can be defined. With XA Secure, HDP now provides a security administration console that is unique to HDP but will be delivered completely in the open for all.

Conclusion

Hadoop requires enterprises to look at security in a new way.  With the merger of XA Secure and Hortonworks, the combined team will be working together and continue to execute on our joint vision. Currently, the XA Secure adds to the security features within HDP 2.1 with the following functionalities:

  • Centralized Security administration
  • Fine grained access control over HDFS, HBase and Hive
  • Centralized Auditing
  • Delegated Administration

What’s Next

In the next series of blogs, we will explore the technical architecture of the XA solution and how the concepts around authorization and auditing work within the different components of the Hadoop ecosystem. We will continue to add more features in the near term, toward the vision to enable comprehensive security within the HDP.

Learn More

The post Hortonworks Offers Holistic and Comprehensive Security for Hadoop appeared first on Hortonworks.


Now Available: HDP Advanced Security (via XA Secure)

$
0
0

Two months ago, we announced the acquisition of XA Secure. and at that time we stated that the software would be generally available by the end of June. We are happy to announce that we have delivered as promised and the solution is available for download for everyone today. Also, if you are an HDP Enterprise Plus Subscription customer, additional support for these new functions is now provided.

HDP Advanced Security expands on the solid security features already found in HDP to provide central administration and coordinated enforcement of enterprise security policy for a Hadoop cluster. This centralized approach to security management allows you to deploy consistent security policy throughout Hortonworks Data Platform.

For a more complete picture of the security features available in HDP we invite you to review our security labs page

Screen Shot 2014-05-14 at 4.51.23 PM

Open Source?

Our team is  busy working on a path forward to extend these new features and integrate them with the rest of HDP. More importantly, we are working to open source the solution.  The first step today was to make it available under a new license and we have begun the work to incubate the feature set within the Apache Software Foundation (ASF).

Hortonworks is uniquely qualified to open source this set of capabilities; staying true to our commitment to 100% open source, developed within an open community, and ensuring our customers can maximize the value of the innovative work being developed within the Apache Hadoop community.

Download or Try Now

Learn More

The post Now Available: HDP Advanced Security (via XA Secure) appeared first on Hortonworks.

Announcing Apache Argus: A Clarion Call

$
0
0

podling

In May, Hortonworks acquired XA Secure and made a promise to contribute this technology to the Apache Software Foundation.  In June, we made it available for all to download and use from our website and today we are proud to announce this technology officially lives on as Apache Argus, an incubator project within the ASF.

This podling has been formed and now the process of graduating Argus to a top-level project (TLP) has begun. Given our proven commitment to the Apache Software Foundation process, we feel uniquely qualified to bring this important technology and it capabilities to the broader open source community.

Argus Charter

So, what exactly will Argus deliver?  With the delivery of YARN, which powers Hadoop’s ability to run multiple workloads operating on shared data sets within a single cluster, a heightened requirement for a centralized approach to security policy definition and coordinated enforcement has surfaced.

Argus will deliver this comprehensive approach to central security policy administration across the core enterprise security requirements of authentication, authorization, accounting and data protection. It already extends baseline features for coordinated enforcement across Hadoop workloads from batch, interactive SQL and real–time IN Hadoop. And we will leverage the extensible architecture of this security platform to apply policies consistently against additional Hadoop ecosystem components (beyond HDFS, Hive, and HBase) including Storm, Solr, Spark, and more. It truly represents a major step forward for the Hadoop ecosystem by providing a comprehensive approach – all completely as open source.

This represents a big step forward for Enterprise Hadoop and our customers are excited about it. Keith Manthey from Equifax said “Argus brings a level of security to Hadoop that is required for Enterprises to be able to consume it safely.”

We believe Argus represents a critical step for adoption of Hadoop across all enterprises.

Argus Calling

Getting to TLP is not a simple process. One of the most critical requirements for TLP graduation of any ASF project is “to have developed an open and diverse meritocratic community”. Under this criterion, a critical eye is placed on the project to make sure that contribution comes from many individuals but also that the list of committers and contributors comes from a variety of different companies. We wholeheartedly agree for a few reasons…

This is a clarion call for any and all developers to get involved to help architect, design and build out this critical project.  As proven over and over, a broad community based, open source effort is undoubtedly the most effective vehicle to deliver on key requirements within the Hadoop ecosystem.  Not only will it speed delivery of these key security requirements, but a collective input from a community of experienced individuals will ensure that the right functionality is delivered.  It will also take an army to not only deliver, but make sure the functions are tested and reliable.   In fact, the ISV community has already rallied.  We are proud to have three key Hadoop security companies speak out on their intent to help foster this project along and we need more.  Get involved.

And finally, we send a big congrats to the Argus team!

The post Announcing Apache Argus: A Clarion Call appeared first on Hortonworks.

Continued Innovation in Hadoop Security

$
0
0

Screen Shot 2014-08-11 at 1.07.49 PMWe are in the midst of a data revolution. Hadoop, powered by Apache Hadoop YARN, enables enterprises to store, process, and innovate around data at a scale never seen before making security a critical consideration. Enterprises are looking for a comprehensive approach to security for their data to realize the full potential of the Hadoop platform unleashed by YARN, the architectural center and the data operating system of Hadoop 2.

Hortonworks and the open community continue to work tirelessly to enhance security in Hadoop. Last week, we shared several blogs that highlight the tremendous innovation underway in the areas of authentication, authorization, auditing, and data protection.

We started last week with a blog introducing Apache Argus – incorporating the key IP from XA Secure – and called on the community to collaborate on an even bigger scale. Argus’ vision is to bring comprehensive security across all components in the Hadoop ecosystem making it easier for the enterprise to manage security policies across authorization, audit and other forms of data security. The Argus charter is a bold vision and in the coming months the team will share our approach to solve some of the biggest challenges around Hadoop security.

We highlighted Apache Knox, which helps Hadoop extend the reach of its services to more users securely by providing a gateway for REST/HTTP based services. Vinay Shukla blogged about a common use case of enabling secure ODBC and JDBC access to Hive, through Apache Knox.

We believe Hadoop can mature only in pure open source model with true collaboration across customers and partners—and security is no exception. We are delighted to showcase our partnership with industry leaders in data protection with the guest blog series last week:

  • Protegrity described how to expand Hadoop security with data-centric security across multiple enterprise systems with Protegrity Vaultless Tokenization for maximum usage of secured data with no data residency issues, and Extended HDFS Encryption for transparent AES file encryption.
  • Voltage Security blogged about data-centric security for the protection of sensitive data in Hadoop, from storage level encryption to standards-recognized Voltage Format Preserving Encryption™ (FPE) and Secure Stateless Tokenization™ to maintain referential integrity of de-identified data, enable regulatory compliance, and neutralize data breaches.
  • Dataguise discussed the use of data discovery and protection with DGSecure which scans data in structured, semi-structured or unstructured formats to provide security at the field level via masking or encryption, along with dashboard reporting.

For a key feature—native encryption of data at rest—the Hadoop community has been working to address this gap. To that end, the community is in the process of voting on this feature. When Transparent Data Encryption in HDFS is completed, data in HDFS can be encrypted natively.

The Hadoop community has worked to provide a Key Management Server (KMS) out of box. With the Key Provider API, Hadoop components can easily integrate with the Key Management software of their choice. This API allows enterprises to plug in their existing corporate standard Key Management software to leverage common Key Management across various components in the stack such as Databases, Email, and Hadoop.

What’s Next?

With the investments and commitments across the Hadoop ecosystem, we look forward to the next phase of the data revolution where the customer can leverage the full power of the next generation platform, with the confidence that their data are protected in all phases: ingest, processing, access, and egress.

Stay tuned for next set of blog series on Argus, Knox, Encryption and more..

The post Continued Innovation in Hadoop Security appeared first on Hortonworks.

Announcing Apache Ranger 0.4.0

$
0
0

With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it in different ways. As YARN propels Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. Apache Ranger provides many of these, with central security policy administration across authorization, accounting and data protection.

On November 17, the community announced the release of Apache Ranger 0.4.0. With this release, the team closed 163 JIRA tickets. Ranger 0.4.0 delivers many new features, fixes and enhancements, chief among those are:

  • Contribution of the technology behind XA Secure to the open source Apache Ranger project
  • Support for Apache Storm and Apache Knox
  • REST APIs for the policy manager
  • Support for storing audit events in HDFS

This blog gives a brief overview of features in Apache Ranger 0.40 and also looks ahead to future plans.

Screen Shot 2014-12-09 at 2.19.23 PM

First Release of open source Apache Ranger

In May of this year, Hortonworks acquired XA Secure to accelerate the delivery of a holistic, centralized and completely open-source approach to Hadoop security. Hortonworks took the proprietary XA Secure technology and contributed it to the Apache Software Foundation. This approach to investing in the tech highlights Hortonworks’ consistent and unwavering commitment to 100% open enterprise Hadoop. XA Secure was one of the first solutions to provide centralized security administration for Hadoop. The Apache Ranger community began with the code contributed by Hortonworks and added other features as part of this release.

The first release of Apache Ranger is an important milestone in the evolution of Hadoop into a mature enterprise-ready platform. Enterprise users can now securely store all types of data and run multiple workloads with different users, leveraging Ranger’s centralized security administration with fine-grain authorization and on-demand audit data. The community can now innovate to further deliver advanced security capabilities, in a way only possible with an open source platform.

Support for Apache Storm and Apache Knox

Apache Ranger now supports administration of access policies for Apache Knox and Apache Storm, extending the Ranger policy administration portal beyond previous support for HDFS, Apache HBase and Apache Hive. Now users can also view audit information for both Storm and Knox in the Ranger portal.

REST APIs for the Policy Manager

Enterprise security administrators can now use REST APIs to create, update and delete security policies. This allows enterprise users and partners to integrate Hadoop security into their existing entitlement stores and update policies using their own tools. REST APIs open the door for extended adoption of Ranger within the ecosystem.

Audit logs stored in HDFS

Lower latency and faster transaction speeds within Hadoop means an increase in the volume of audit events. To meet this growing need, Apache Ranger now offers the flexibility to store audit logs in HDFS. This leverages Hadoop’s reliable and scalable infrastructure to store and process the underlying audit events. Ranger stores the audit logs in a secure location, only accessible to privileged users.

Plans for the Future

The release would not have been possible without excellent contributions from the dedicated, talented community members. The community plans continued execution on the vision of providing comprehensive security within the Hadoop ecosystem, with the plan to extend support to Apache Solr, Kafka, and Spark. We also intend to streamline other areas of security, including authentication and encryption. In the coming weeks, we plan to publish a detailed roadmap on the Ranger wiki or through Apache JIRAs.

Download Apache Ranger, Take the Tutorial and Watch the Webinar

The post Announcing Apache Ranger 0.4.0 appeared first on Hortonworks.

Best Practices for Hive Authorization Using Apache Ranger in HDP 2.2

$
0
0

Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true enterprise SQL at Hadoop scale and speed.

With Hive’s prominence in the enterprise, security within Hive has come under greater focus from enterprise users. They have come to expect fine grain access control and auditing within Hive. Apache Ranger provides centralized security administration for Hadoop, and it enables fine grain access control and deep auditing for Apache components such as Hive, HBase, HDFS, Storm and Knox.

This blog covers the best practices for configuring security for Hive with Apache Ranger and focuses on the use cases of data analysts accessing Hive, covering three scenarios:

  • Data analysts accessing only Hiveserver2, with limited access to HDFS files
  • Data analysts accessing both Hiveserver2, and HDFS files through Pig/MR jobs
  • Data analysts accessing Hive CLI

For each scenario, we will illustrate how to configure Hive and Ranger and discuss how security is handled. You can use either deployment: Sandbox or HDP 2.2 cluster installed using Apache Ambari. Note the pre-requisites below.

Prerequisites

  • HDP 2.2 Sandbox: If you are using the HDP 2.2 Sandbox, ensure that you disable the global “allow policies” in Ranger before configuring any security policies. The global “allow policy” is the default in the sandbox, to let users access Hive and HDFS without any permission checks.
  • OR

  • HDP 2.2 cluster: Ranger plugins for HDFS and Hive as well as Ranger admin installed manually (documentation for Ranger install can be found here).

Scenario 1 – HiveServer2 access with limited HDFS access

In this scenario, many analysts access data through HiveServer2, though specific administrators may have direct access to HDFS files.

Column level access control over Hive data is a major requirement. You can enable column level security access by following these steps:

Step 1. Hive Configuration

In Ambari –> Hive-> Config, ensure the hive.server2.enable.doAs is set to “false”. What this means is that Hiveserver2 will run MR jobs in HDFS as “hive” user. Permissions in HDFS files related to Hive can be given only to “hive” users, and no analyst would be able to access HDFS files directly.

hive_ranger_1

Step 2. Ranger configuration

With Ranger installed, you can configure a policy at a column level as shown below:

hive_ranger_2

In this example, the marketing group has only access to “phone number”, “plan” and “date” columns in the “customer_details” table.

Step 3.Run a query

You can use Hue or Beeline to run a query against this table. In this example from the sandbox, we have used user “mktg1” to run the query against this table.

hive_ranger_3

After successfully running the query, check the audit logs in Ranger

hive_ranger_4

You will see the query running in Hive as the original user (“mktg1” in this case), while the related tasks in HDFS will be executed as the “hive” user.

With Ranger enabled, the only way data analysts can view data would be through Hive and the access in Hive would be controlled at the column level. Administrators who need access at HDFS level can be given permissions through Ranger policies for HDFS or through HDFS ACLs.

Scenario 2 – Hiveserver2 and HDFS access

In this scenario, analysts use Hiveserver2 to run SQL queries while also running Pig/MR jobs that run directly on HDFS data. In this case, we would need to enable permissions within Hive as well as HDFS

As in previous scenarios, ensure that Hive and Ranger is installed and Ambari is up and running. If you are using the sandbox, ensure that any global policies in Ranger have been disabled.

Step 1. Configuration Changes: hive-site.xml or in Ambari → Hive → Config

In Ambari –> Hive-> Config, ensure the hive.server2.enable.doAs is set to “true”. What this means is that Hiveserver2 will run MR jobs in HDFS as the original user.

hive_ranger_5

Make sure to restart Hive service in Ambari after changing any configuration.

Step 2. In Ranger, within HDFS, create permissions for files pertaining to hive tables

In the example below, we will be giving the marketing team “read” permission to the file corresponding to the Hive table “customer_details”

hive_ranger_6

The users can access data through HDFS commands as well.

hive_ranger_7

Step 3. check the audit logs in Ranger

. You will see audit entries in Hive and HDFS with the original user’s ID.

hive_ranger_8

Scenario 3 – Hive CLI access

If the analysts use Hive CLI as the predominant method for running queries, we need to configure security differently.

Hive CLI loads hive configuration into the client and gets data directly from HDFS or through map reduce/Tez tasks. The best way to protect Hive CLI would be to enable permissions for HDFS files/folders mapped to the Hive database and tables. In order to secure metastore, it is also recommended to turn on storage-based authorization.

Please note that Ranger Hive plugin only applies to Hiveserver2. Hive CLI should be protected using permissions at the HDFS folder/file level using Ranger or HDFS ACLs.

  1. First identify the files corresponding to tables in Hive. You can look through the directory /apps/hive/warehouse
  2. Set permissions for this folder in Ranger -> HDFS Policies

    hive_ranger_9

  3. Run queries through Hive CLI

    sandbox ~]# su - mktg1
    [mktg1@sandbox ~]$ hive
    hive> use xademo;
    OK
    Time taken: 9.855 seconds
    hive> select phone_number from customer_details;
    OK
    PHONE_NUM
    5553947406
    7622112093
    5092111043
    9392254909
    7783343634

  4. Check audit entries in Ranger

    hive_ranger_10

  5. Run any DDL commands through Hive CLI.

    [root@sandbox ~]# su - it1
    [it1@sandbox ~]$ hive
    hive> use xademo;
    OK
    Time taken: 12.175 seconds
    hive> drop table customer_details;

    FAILED: SemanticException Unable to fetch table customer_details. java.security.AccessControlException: Permission denied: user=it1, access=READ, inode=”/apps/hive/warehouse/xademo.db/customer_details”:hive:hdfs:drwx——

  6. The action drop table is denied due to lack of permission at the HDFS level. It can be verified in the Ranger audit logs:

    hive_ranger_11

    Summary

    Hive will continue to evolve as the predominant application for accessing data within Hadoop. With Apache Ranger, you can configure policies to support fine grain access control in Hive and HDFS and secure your data from unauthorized access. Use this blog as a guide to configure security policies that best support your data access needs and use cases.

    The post Best Practices for Hive Authorization Using Apache Ranger in HDP 2.2 appeared first on Hortonworks.

Ambari 2.0 for Deploying Comprehensive Hadoop Security

$
0
0

Hortonworks Data Platform (HDP) provides centralized enterprise services for comprehensive security to enable end-to-end protection, access, compliance and auditing of data in motion and at rest. HDP’s centralized architecture—with Apache Hadoop YARN at its core—also enables consistent operations to enable provisioning, management, monitoring and deployment of Hadoop clusters for a reliable enterprise-ready data lake.

But comprehensive security and consistent operations go together, and neither is possible in isolation.

We published two blogs recently announcing Ambari 2.0 and its new ability to manage rolling upgrades. This post will look at those innovations through the security lens, because security, like operations, is a core requirement for enterprise-ready Hadoop.

Security in Hadoop Today

HDP offers comprehensive security, across all batch, interactive, or real-time workloads and access patterns. Hortonworks is focused on delivering comprehensive security across 5 pillars, namely centralized administration, authentication, authorization, audit, and data protection.

sec_1

HDP provides comprehensive security by way of three key services:

  • Kerberos is an MIT standard adopted by the open source community to authenticate users attempting to access Hadoop.
  • Apache Ranger provides centralized security administration for HDFS, Hive, HBase, Storm and Knox as well as fine-grain access control.
  • Apache Knox provides perimeter security for API access and REST services.

Security Setup with Ambari 2.0

Ambari 2.0 represents a significant milestone in the community’s ongoing work to make Hadoop enterprise-ready with easy security setup and administration. Now Ambari 2.0 can help administrators automate Kerberos setup for a cluster, install KDC and create service principles. Administrators can also use Ambari to install Ranger admin and enable the Ranger plugin with a few clicks.

Automated Kerberos integration

Before Ambari 2.0, the Kerberos integration in Hadoop required a combination of manual steps to install and manage these important components:

  • KDC (key distribution center),
  • User and service principles (identities) and
  • Respective keytabs (tokens).

With Ambari 2.0, the entire Kerberos setup process is automated, now with the following:

  • A step-by-step wizard to setup the Kerberos infrastructure
  • Integration with existing MIT KDC or Active Directory infrastructure
  • Deployment, configuration and management of Kerberos Clients
  • First time setup as well as ongoing management for adding new services or nodes
  • Automated creation of principals
  • Automated generation and distribution of keytabs
  • Support for regeneration of keytabs

Ambari 2.0 can automate Kerberos deployment and management for existing clusters already using Kerberos, as well as for users looking to install a new cluster.

Figure 1: Initial screen for Kerberos setup

Figure 1: Initial screen for Kerberos setup

This Kerberos Overview documentation for Ambari 2.0 contains an overview and step-by-step details on Kerberos setup.

Automated Ranger deployment

Hortonworks introduced Apache Ranger to deliver the vision of coordinated security across Hadoop with centralized administration, fine-grain access control and audit. Apache Ranger’s first release included enhancements to existing capabilities in the original code base developed at XA Secure and added support for audit storage in HDFS, support for Apache Storm and Knox authorization and auditing, and also REST APIs for managing policies.

With Ambari 2.0, administrators can now easily add comprehensive security through Ranger to either an existing or new cluster. Ambari 2.0 adds in the following benefits to Ranger:

  • Automated install of Ranger policy administrator and user sync. The policy database (mySQL or Oracle) can be configured and user sync can be integrated with LDAP/AD or Unix.
  • Easy one-click setup of the Ranger plugin for HDFS, Hive , HBase, Storm and Knox
  • Ability to start/stop services through the Ambari UI
  • Ability to disable plugins through the Ambari UI

The following screen shots show a user adding Ranger service via Ambari.

Figure 2. Ambari screen to add Ranger service

Figure 2. Ambari screen to add Ranger service

Figure 3: Ambari screen showing already installed and running Ranger service

Figure 3: Ambari screen showing already installed and running Ranger service

Hortonworks continues to lead open-source innovation to enable comprehensive data security for Hadoop—making it easier for security administrators to protect their clusters. With Ambari 2.0, we added the automated install and administration of the HDP cluster’s security infrastructure, with support for installing Kerberos, Apache Knox and Apache Ranger.

This innovation highlights what Hortonworks customers appreciate about our 100% open-source Apache Hadoop platform. HDP provides centralized enterprise services for comprehensive security and consistent operations to enable provisioning, management, monitoring and deployment of secure Hadoop clusters.

Hadoop is ready for the enterprise—providing any data, for any application, anywhere.

More About Comprehensive Security and Consistent Operations in HDP

Read recent Ambari posts

Learn more about the Apache projects

The post Ambari 2.0 for Deploying Comprehensive Hadoop Security appeared first on Hortonworks.

How to Leverage Big Data Security with Informatica and Hortonworks

$
0
0

In this guest blog, Sumeet Kumar Agrawal, principal product manager for Big Data Edition product at Informatica, explains how Informatica’s Big Data Edition integrates with Hortonworks’ security projects, and how you can secure your big data projects.

Many companies already use big data technology like Hadoop for their production environments, so they can store and analyze petabytes of data including transactional data, weblog data, and social media content to gain better insights about their customers and business. Accenture found in a recent survey that 79 percent of respondents agree that “companies that do not embrace big data will lose their competitive position and may even face extinction.”

However, without proper security, your Big Data solution might very well open the doors to breaches that have the potential to cause serious reputational damage and legal repercussions. Hortonworks has led the community in bringing comprehensive security, in open source, to Apache Hadoop. Partners like Informatica can leverage security frameworks in Hadoop to enable users to securely bring in data from external sources, transform it and load it into the different Hadoop components.

Informatica Big Data Edition Integration with Hortonworks Security Projects

Informatica Big Data Edition’s codeless, visual development environment accelerates the ability of organizations to put their Hortonworks Data Platform clusters into production. As an alternative to implementing complex hand coded data movement and transformation, Informatica Big Data Edition enables high-performance data integration and quality pipelines that leverage the full power of each node of your Hadoop cluster with team-based visual tools that can be used by any ETL developer.

Informatica Big Data Edition integrates with security framework offered within HDP. The following figure shows the security offerings within the latest version of HDP:

hwx_inf

Authentication

Kerberos

Kerberos is the most widely adopted authentication technology in the Big Data space. Kerberos is an authentication protocol for trusted hosts on untrusted networks. It provides secure authentication between clients/nodes/services. Starting with Ambari 2.0, Kerberos can be fully deployed using Ambari.

Informatica Big Data Edition integrates completely with Kerberos. A key aspect in Kerberos integration is the Kerberos Domain Controller (KDC). Informatica supports both Active Directory and MIT-based KDC.

KNOX:

Knox is designed for applications for using REST APIs and using JDBC/ODBC over http to access or update data. It is not currently recommended for performance intensive applications such as Informatica. Knox is also not designed for RPC based access (Hadoop clients), in which case it is recommended to use Kerberos to authentication system and end users.

Here is a representative architecture where Knox is deployed over a Hadoop cluster. Knox, in this example, provides perimeter security for users accessing data through applications leveraging REST or HTTP based services.

hwx_inf_1

Informatica Big Data Edition provides several rich functionalities like mass data ingestion, data preparation on Hadoop etc. KNOX may not be recommended for some of these functionalities. It is suggested to use Kerberos for authentication when Informatica ETL tools are being leveraged for data preparation on Hadoop.

Authorization

Apache Ranger

Apache Ranger offers a centralized security framework to manage fine-grained access control over Hadoop data access components like Apache Hive and Apache HBase. Within Hive, there are recommended best practices for setting up policies in Hiveserver2 and Hive CLI. You can find more details in this blog: http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/

For Hiveserver2, the Hive authorization does not allow the transform function for SQL std authorization (and thus Ranger). Informatica BDE has plans to support Hiveserver2 in near future and recommends customers use storage-based authorization for protecting metaserver when using Hive with Informatica. More details on storage based authorization can be found here.

Summary

In summary, Informatica and Hortonworks expand the big data security space and ensure that all organizations are secure while implementing their big data projects.

About the Author

Sumeet Kumar Agrawal is a Principal Product Manager for Big Data Edition product at Informatica. Based in the Bay Area, Sumeet has over 8 years of experience working on different Informatica technologies. Sumeet is responsible for defining Informatica’s big data product strategy, roadmap & working with customers to define their big data platform. Sumeet expertise includes Hadoop ecosystem, security, as well as development oriented technologies such as Java & web services. Sumeet is also responsible for evaluating Hadoop partner integration technologies for Informatica.

The post How to Leverage Big Data Security with Informatica and Hortonworks appeared first on Hortonworks.


New in HDP 2.3: Enterprise Grade HDFS Data At Rest Encryption

$
0
0

Apache Hadoop has emerged as a critical data platform to deliver business insights hidden in big data. As a relatively new technology, system administrators hold Hadoop to higher security standards. There are several reasons for this scrutiny:

  • External ecosystem that comprise of data repositories and operational systems that feed Hadoop deployments are highly dynamic and can introduce new security threats on a regular basis.
  • Hadoop deployment contains large volume of diverse data stored over longer periods of time. Any breach of this enterprise-wide data can be catastrophic.
  • Hadoop enables users across multiple business units to access, refine, explore and enrich data using different methods, thereby raising the risk for potential breach.

Security Pillars in Hortonworks Data Platform (HDP)

HDP is the only Hadoop platform offering comprehensive security and centralized administration of security policies across the entire stack. At Hortonworks we take a holistic view to enterprise security requirements and ensure that Hadoop can not only define but also apply a comprehensive policy. HDP leverages Apache Ranger for centralized security administration, authorization and auditing; Kerberos and Apache Knox for authentication and perimeter security, and support for native/partner solutions for encrypting over the wire and data-at-rest.

hdf_sec_1

Data at REST Encryption – State of the union

In addition to authentication and access control, data protection adds a robust layer of security, by making data unreadable in transit over the network or at rest on a disk.

Compliance regulations, such as HIPAA and PCI, stipulate that encryption is used to protect sensitive patient information and credit card data. Federal agencies and enterprises in compliance driven industries, such as healthcare, financial services and telecom, leverage data at rest encryption as core part of their data protection strategy. Encryption helps protect sensitive data, in case of an external breach or unauthorized access by privileged users.

There are several encryption methods, varying in degrees of protection. Disk or OS level encryption is the most basic version, which protects against stolen disks. Application level encryption, on the other hand, provides higher level of granularity and prevents rogue admin access; however, it adds a layer of complexity to the architecture.

Traditional Hadoop users have been using disk encryption methods such as dm-crypt as their choice for data protection. Although OS level encryption is transparent to Hadoop, it adds a performance overhead and does not prevent admin users from accessing sensitive data. Hadoop users are now looking to identify and encrypt only sensitive data, a requirement that involves delivering finer grain encryption at the data level.

Certifying HDFS Encryption

The HDFS community worked together to build and introduce transparent data encryption in HDFS. The goal was to encrypt specific HDFS files by writing them to HDFS directories known as encryption zones (EZ). The solution is transparent to applications leveraging HDFS file system, such as Apache Hive and Apache HBase. In other words, there is no major code change required for existing applications already running on top of HDFS. One big advantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.

As with any other Hadoop security initiative, we have adopted a phased approach of introducing this feature to customers running HDFS in production environment. After the technical preview announcement earlier this year, Hortonworks team has worked with select group of customers to gather use cases and perform extensive testing against those use cases. We have also devoted significant development effort in building a secure key storage in Ranger, by leveraging the open source Hadoop KMS. Ranger now provides centralized policy administration, key management and auditing for HDFS encryption.

We believe that HDFS encryption, backed by Ranger KMS, is now enterprise ready for specific use cases. We will introduce support for these use cases as part of the HDP 2.3 release.

HDFS encryption in HDP – Components and Scope

hdfs_sec_2

The HDFS encryption solution consists of 3 components (more details in the Apache website here)

  • HDFS encryption/decryption enforcement: HDFS client level encryption and decryption for files within an Encryption Zone
  • Key provider API: API used by HDFS client to interact with KMS and retrieve keys
  • Ranger KMS: The open source Hadoop KMS is a proxy that retrieves keys for a client. Working with the community, we have enhanced Ranger GUI to enable securely store key using a database and centralize policy administration and auditing. (Please refer to the screenshots below)

hdfs_sec_3

 

hdfs_sec_4

 

We have  extensively tested HDFS data at rest encryption across the HDP stack and will provide a detailed set of best practices for how to use HDFS data at rest encryption among various use cases as part of the HDP 2.3 release.

We are also working with key encryption partners so that they can integrate their own enterprise ready KMS offerings with HDFS encryption. This offers a broader choice to customers looking to encrypt their data in Hadoop.

Summary

In summary, to encrypt sensitive data, protect privileged access and go beyond OS level encryption, enterprise can now use HDFS transparent encryption. Both HDFS encryption and Ranger’s KMS are open source, enterprise-ready, and satisfy compliance sensitive requirements. As such they facilitate Hadoop adoption among compliant conscious enterprises.

The post New in HDP 2.3: Enterprise Grade HDFS Data At Rest Encryption appeared first on Hortonworks.

Announcing Apache Ranger 0.5.0

$
0
0

As YARN drives Hadoop’s emergence as a business-critical data platform, the enterprise requires more stringent data security capabilities. The Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a platform for centralized security policy administration across the core enterprise security requirements of authorization, audit and data protection.

On June 10th, the community announced the release of Apache Ranger 0.5.0. With this release, the community took major steps to extend security coverage for Hadoop platform and deepen its existing security capabilities. Apache Ranger 0.5.0 addresses over 194 JIRA issues and delivers many new features, fixes and enhancements. Among these improvements, the following features are notable:

  • Centralized administration, authorization and auditing for Solr, Kafka and YARN
  • Apache Ranger key management store (KMS)
  • Hooks for dynamic policy conditions
  • Metadata protection in Hive
  • Support queries for audit data stored in HDFS using Solr
  • Optimization of auditing at source
  • Pluggable architecture for Apache Ranger (Ranger Stacks)

This blog provides an overview of the new features and how they integrate with other Hadoop services, as well as provides a preview of focus areas that the community has planned for upcoming releases.

Centralized Administration, Authorization and Auditing for Solr, Kafka and YARN

Administrators can now use Apache Ranger’s centralized platform to manage access policies for Solr (collection level), Kafka (topic level) and YARN (capacity schedule queues). The centralized authorization and auditing capability add into what was previously available for HDFS, HBase, Hive, Knox and Storm. As a precursor to this release, Hortonworks security team worked closely with the community to build authentication support (Kerberos) and authorization APIs in Apache Solr and Apache Kafka.

Administrators can now apply security policies to protect queues in Kafka and ensure authorized users are able to submit or consume from a Kafka topic. Similarly, Ranger can be used to control query access at Solr collection level, ensuring sensitive data in Apache Solr is secured in production environments. Apache Ranger’s integration with YARN RM enables administrators to control which applications can submit to a queue and prevent rogue applications from using YARN.

Apache Ranger Key Management Store (KMS)

In this release, HDP takes a major step forward in meeting enterprises’ requirements for security and compliance by introducing transparent data encryption for encrypting data for HDFS files, combined with a Ranger embedded open source Hadoop KMS. Ranger now provides security administrators the ability to manage keys and authorization policies for KMS.

This encryption feature in HDFS, combined with KMS access policies maintained by Ranger, prevents rouge Linux or Hadoop administrators from accessing data and supports segregation of duties for both data access and encryption. You can find more details on TDE through this blog.

Hooks for dynamic policy conditions

As enterprises’ Hadoop deployments mature, there is a need to move from static role- based access control to access-based on dynamic rules. An example, would be to provide access based on time of the day (9am to 5pm), or geo (access only if logged in from a particular location) or even data values.

In Apache Ranger 0.5.0, community took the first step to move towards a true ABAC (attribute based access control) model by introducing hooks to manage dynamic policies, thereby providing a framework for users to control access based on dynamic rules. Users can now specify their own conditions and rules (similar to a UDF) as part of service definitions, and these conditions can vary by service (HDFS, Hive etc). In the future, based on community feedback, Apache Ranger might include some of the conditions out of the box.

Metadata Protection in Hive

Apache Ranger 0.5.0 provides the ability to protect metadata listing in Hive based on underlying permissions. This functionality is especially relevant for multi tenant environments where users cannot view other tenants’ metadata (tables, columns).

The following commands related to Hive metadata will now provide relevant information only based on user privileges.

  • Show Databases
  • Show Tables
  • Describe table
  • Show Columns

Support Queries for Audit Data Using Solr

Currently, Apache Ranger UI provides the ability to perform interactive queries against audit data stored in RDBMS. In this release, we are introducing support for storing and querying audit data in Solr. This functionality removes dependency on database for audit and provides users with visibility into Solr data using dashboards built on banana UI. We recommended that users enable audit writing for both Solr and HDFS, and purge data in Solr at regular intervals.

Optimization of Auditing at Source

Auditing all events or jobs in Hadoop generate high volume of audit data. Apache Ranger 0.5.0 provides the ability to summarize audit data at the source for given time period, by user, resource accessed and action, thereby reducing audit data volume and noise and impact on underlying storage for improved performance.

Pluggable Architecture for Apache Ranger (Ranger Stacks)

As part of this release, the Ranger community worked extensively to revamp the Apache Ranger architecture. As a result of this effort, Apache Ranger 0.5.0 now provides a pluggable architecture for policy administration and enforcement. Using a “single pane of glass,” end-users can configure and manage their security across all components of their Hadoop stack and extend it to their entire big data environment.

Apache Ranger 0.5.0 enables customers and partners to easily add a new “service” to support a new component or data engine. Based on JSON, this service is configurable.

Users can create custom service as plug-in to any data store, build and manage services centrally for their big data BI applications.

Preview of Features to Come

The Apache Ranger release would not have been possible without contributions from the dedicated community members who have done a great job understanding the needs of the user community and delivering them. Based on demand from the user community, we will continue to focus our efforts in three primary areas:

  • Global data classification, “tags,” based security policies
  • Expanding encryption support to HBase and Hive
  • Ease of installation and use, through better Apache Ambari integration
  • Read the Apache Ranger 0.5.0 Release Notes

The post Announcing Apache Ranger 0.5.0 appeared first on Hortonworks.

Best practices in HDFS authorization with Apache Ranger

$
0
0

HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked into the HDFS layer. HDFS is protected using Kerberos authentication, and authorization using POSIX style permissions/HDFS ACLs or using Apache Ranger.

Apache Ranger (http://hortonworks.com/hadoop/ranger/) is a centralized security administration solution  for Hadoop that enables administrators to create and enforce security policies for HDFS and other Hadoop platform components.

How Ranger policies work for HDFS?

In order to ensure security in HDP environments, we recommend all of our customers to implement Kerberos, Apache Knox and Apache Ranger.

Apache Ranger offers a federated authorization model for HDFS. Ranger plugin for HDFS checks for Ranger policies and if a policy exists, access is granted to user. If a policy doesn’t exist in Ranger, then Ranger would default to native permissions model in HDFS (POSIX or HDFS ACL). This federated model is applicable for HDFS and Yarn service in Ranger.

Screen Shot 2016-01-05 at 8.44.38 AM

For other services such as Hive or HBase, Ranger operates  as the sole authorizer which means only Ranger policies are in effect. The option for fallback model is configured using a property in Ambari → Ranger → HDFS config → Advanced ranger-hdfs-security

Screen Shot 2016-01-05 at 8.45.39 AM

The federated authorization model enables customers to safely implement Ranger in an existing cluster without affecting  jobs which rely on POSIX permissions. We recommend to enable  this option as the default model for all deployments.

Ranger’s user interface makes it easy for administrators to find the permission (Ranger policy or native HDFS) that provides access to the user. Users can simply navigate to Ranger→ Audit and look for the values in the enforcer column of the audit data. If the populated value in Access Enforcer column is “Ranger-acl”, it indicates that a Ranger policy provided access to the user. If the Access Enforcer value is “Hadoop-acl”, then the access was provided by native HDFS ACL or POSIX permission.

Screen Shot 2016-01-05 at 8.45.49 AM

Best practices for HDFS authorization

Having a federated authorization model may create a challenge for security administrators looking to plan a security model for HDFS.

After Apache Ranger and Hadoop have been installed, we recommend administrators to implement the following steps:

  • Change HDFS umask to 077
  • Identify directory which can be managed by Ranger policies
  • Identify directories which need to be managed by HDFS native permissions
  • Enable Ranger policy to audit all records

Here are the steps again in detail.

  1. Change HDFS umask to 077 from 022. This will prevent any new files or folders to be accessed by anyone other than the owner

Administrators can change this property via Ambari:

Screen Shot 2016-01-05 at 8.45.58 AM

The umask default value in HDFS is configured to 022, which grants all the users  read permissions to all HDFS folders and files. You can check by running the following command in recently installed Hadoop

$ hdfs dfs -ls /apps

Found 3 items

drwxrwxrwx   – falcon hdfs       0 2015-11-30 08:02 /apps/falcon

drwxr-xr-x   – hdfs   hdfs           0 2015-11-30 07:56 /apps/hbase

drwxr-xr-x   – hdfs   hdfs           0 2015-11-30 08:01 /apps/hive

  1. Identify the directories that can be managed by Ranger policies

We recommend that permission for application data folders (/apps/hive, /apps/Hbase) as well as any custom data folders be managed through Apache Ranger. The HDFS native permissions for these directories need to be restrictive. This can be done through changing permissions in HDFS using chmod.

Example:

$ hdfs dfs -chmod -R 000 /apps/hive

$ hdfs dfs -chown -R hdfs:hdfs /apps/hive

$ hdfs dfs -ls /apps/hive

Found 1 items

d———   – hdfs hdfs          0 2015-11-30 08:01 /apps/hive/warehouse

Then navigate  to Ranger admin and give explicit permission to users as needed. For example:

Screen Shot 2016-01-05 at 8.46.04 AM

Administrators should follow the same process  for other data folders as well. You can validate  whether your changes are in effect by doing the following:

  • Connect to HiveServer2 using beeline
  • Create a table
    • create table employee( id int, name String, ssn String);
  • Go to ranger, and check the HDFS access audit. The enforcer should be ‘ranger-acl’Screen Shot 2016-01-05 at 8.46.16 AM
  1. Identify directories which can be managed by HDFS permissions

It is recommended  to let HDFS manage the permissions for /tmp and /user folders. These are used by applications and jobs which create user level directories.

Here, you should also set the initial permission for /user folder  to “700”, similar to the example below

 

hdfs dfs -ls /user

Found 4 items

drwxrwx—   – ambari-qa hdfs          0 2015-11-30 07:56 /user/ambari-qa

drwxr-xr-x   – hcat      hdfs          0 2015-11-30 08:01 /user/hcat

drwxr-xr-x   – hive      hdfs          0 2015-11-30 08:01 /user/hive

drwxrwxr-x   – oozie     hdfs          0 2015-11-30 08:02 /user/oozie

 

$ hdfs dfs -chmod -R 700 /user/*

$ hdfs dfs -ls /user

Found 4 items

drwx——   – ambari-qa hdfs          0 2015-11-30 07:56 /user/ambari-qa

drwx——   – hcat      hdfs          0 2015-11-30 08:01 /user/hcat

drwx——   – hive      hdfs          0 2015-11-30 08:01 /user/hive

drwx——   – oozie     hdfs          0 2015-11-30 08:02 /user/oozie

  1. Ensure auditing for all HDFS data.

Auditing in Apache Ranger can be controlled as a policy. When Apache Ranger is installed through Ambari, a default policy is created for all files and directories in HDFS and with auditing option enabled.This policy is also used by Ambari smoke test user “ambari-qa” to verify HDFS service through Ambari. If administrators disable this default policy, they would need to create a similar policy for enabling audit across all files and folders.

Screen Shot 2016-01-05 at 8.46.21 AM

Summary

Securing HDFS files through permissions is a starting point for securing Hadoop. Ranger provides a centralized interface for managing security policies for HDFS. Security administrators are recommended to use a combination of HDFS native permissions and Ranger policies to provide comprehensive coverage for all potential use cases. Using the best practices outlined in this blog, administrators can simplify the access control policies for administrative and user directories, files in HDFS.

The post Best practices in HDFS authorization with Apache Ranger appeared first on Hortonworks.

Hortonworks Offers Holistic and Comprehensive Security for Hadoop

$
0
0

It has been an exciting  few weeks for the XA Secure team. We formally joined Hortonworks on 5/15 and have received a warm  welcome from our new peers. Even more exciting are the numerous discussions we have had with current customers and prospects on how we can bring together a comprehensive and holistic security capabilities […]

The post Hortonworks Offers Holistic and Comprehensive Security for Hadoop appeared first on Hortonworks.

Data Protection Takes Center Stage With the GDPR

How Data Analysis in Sports Is Changing the Game

Viewing all 14 articles
Browse latest View live


Latest Images