Skip to main content
Version: Next

Dremio

Certified

Important Capabilities

CapabilityStatusNotes
Data ProfilingOptionally enabled via configuration
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
DomainsSupported via the domain config field
Platform InstanceEnabled by default
Table-Level LineageEnabled by default

This plugin extracts the following:

  • Metadata for databases, schemas, views and tables
  • Column types associated with each table
  • Table, row, and column statistics via optional SQL profiling
  • Lineage information for views and datasets

CLI based Ingestion

Install the Plugin

The dremio source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: dremio
config:
# Coordinates
hostname: localhost
port: 9047
tls: true

# Credentials
authentication_method: password
username: user
password: pass

include_query_lineage: True

source_mappings:
- platform: s3
platform_name: samples

schema_pattern:
allow:
- ".*"

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
authentication_method
string
Authentication method: 'password' or 'PAT' (Personal Access Token)
Default: password
disable_certificate_verification
boolean
Disable TLS certificate verification
Default: False
dremio_cloud_region
string
Dremio Cloud region ('US' or 'EMEA')
hostname
string
Hostname or IP Address of the Dremio server
include_copy_lineage
boolean
Whether to include copy lineage
Default: True
include_query_lineage
boolean
Whether to include query-based lineage information.
Default: False
include_table_rename_lineage
boolean
Whether to include table rename lineage
Default: True
is_dremio_cloud
boolean
Whether this is a Dremio Cloud instance
Default: False
max_workers
integer
Maximum number of worker threads for parallel processing.
Default: 20
password
string
Dremio password or Personal Access Token
path_to_certificates
string
Path to SSL certificates
Default: /vercel/path0/metadata-ingestion/venv/lib/python3....
platform_instance
string
Platform instance for the source.
Default:
port
integer
Port of the Dremio REST API
Default: 9047
tls
boolean
Whether the Dremio REST API port is encrypted
Default: True
username
string
Dremio username
env
string
Environment to use in namespace when constructing URNs.
Default: PROD
dataset_pattern
AllowDenyPattern
Regex patterns for schemas to filter
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
dataset_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
dataset_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
dataset_pattern.allow.string
string
dataset_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
dataset_pattern.deny.string
string
schema_pattern
AllowDenyPattern
Regex patterns for schemas to filter
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
schema_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
schema_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
schema_pattern.allow.string
string
schema_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
schema_pattern.deny.string
string
source_mappings
array
Mappings from Dremio sources to DataHub platforms and datasets.
source_mappings.DremioSourceMapping
DremioSourceMapping
source_mappings.DremioSourceMapping.platform 
string
source_mappings.DremioSourceMapping.platform_name 
string
source_mappings.DremioSourceMapping.databaseName
string
source_mappings.DremioSourceMapping.dremio_source_type
string
source_mappings.DremioSourceMapping.platform_instance
string
source_mappings.DremioSourceMapping.rootPath
string
source_mappings.DremioSourceMapping.env
string
Default: PROD
stateful_ingestion
StatefulIngestionConfig
Stateful Ingestion Config
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False

Code Coordinates

  • Class Name: datahub.ingestion.source.dremio.dremio_source.DremioSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Dremio, feel free to ping us on our Slack.