Skip to content

Databricks User Guide#

Spark Fine-grained Access Control (FGAC)#

Enable View-level Access Control#

  1. Edit the SparkConfig of your existing Privacera-enabled Databricks Cluster.

  2. Add the below property.

    spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable true
    
  3. Save and restart the Databricks cluster.

Apply View-level Access Control#

To CREATE VIEW in Spark Plug-In, you need the permission for DATA_ADMIN.

The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.

Use Case

  • Let’s take a use case where we have 'employee_db' database and two tables inside it with below data:

    #Requires create privilege on the database enabled by default;
    create database if not exists employee_db;
    

  • Create two tables.

    #Requires create privilege on the table level;
    
    create table if not exists employee_db.employee_data(id int,userid string,country string);
    create table if not exists employee_db.country_region(country string,region string);
    

  • Insert test data.

    #Requires update privilege on the table level;
    
    insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU'); 
    insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
    

    select * from employee_db.country_region; 
    #Requires select privilege on the column level;
    
    select * from employee_db.employee_data; 
    #Requires select privilege on the column level;
    
  • Now try to create a View on top of above two tables created, we will get ERROR as below:

    create view employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
    
    Error: Error while compiling statement: 
    FAILED: HiveAccessControlException 
    Permission denied: user [emily] does not have [DATA_ADMIN] privilege on [employee_db/employee_data] (state=42000,code=40000)
    

  • Create a view policy for table on employee_db.employee_region as shown in the above image.

    Now create a policy as shown above in the image and try to execute the same query the query, it will pass through.

    Note

    Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource.

Alter View#

#Requires Atler permission on the view;
ALTER VIEW employee_db.employee_region AS  select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;

Rename View#

#Requires Atler permission on the view;
ALTER VIEW  employee_db.employee_region RENAME to employee_db.employee_region_renamed;

Drop View#

#Requires Drop permission on the view;
DROP VIEW employee_db.employee_region_renamed;

Row Level Filter#

create view if not exists employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;

select * from employee_db.employee_region;

Column Masking#

select * from employee_db.employee_region;

Whitelisting for Py4J Security Manager#

Certain Python methods are blacklisted on Databricks clusters to enhance security on the clusters. While trying to access such methods, you might receive the following error:

Error

py4j.security.Py4JSecurityException: … is not whitelisted”.

If you still want to access the Python classes or methods, you can add them to a whitelisting file. To whitelist classes or methods, do the following:

  1. Create a file containing a list of all the packages, class constructors or methods that should be whitelisted.

    1. For whitelisting a complete java package (including all it’s classes), add the package name ending with .* in the end.

      org.apache.spark.api.python.*
      
    2. For whitelisting constructors of the given class, add the fully qualified class name.

      org.apache.spark.api.python.PythonRDD
      
    3. For whitelisting specific methods of a given class, add the fully qualified class name followed by the method name.

      org.apache.spark.api.python.PythonRDD.runJobToPythonFile
      org.apache.spark.api.python.SerDeUtil.pythonToJava
      
  2. Once you have added all the required packages, classes and methods, the file will contain a list of commands as shown below.

    org.apache.spark.sql.SparkSession.createRDDFromTrustedPath
    org.apache.spark.api.java.JavaRDD.rdd
    org.apache.spark.rdd.RDD.isBarrier
    org.apache.spark.api.python.*
    
  3. Upload the file to a DBFS location that could be referenced from the Spark Application Configuration section.

    Suppose the whitelist.txt file contains the classes/methods to be whitelisted. Run following command to upload to Databricks.

    dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
    
  4. Add the following command to the Spark Config with reference to the DBFS file location.

    spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
    
  5. Restart your cluster.


Last update: August 24, 2021