Apache Atlas
Apache Atlas is an enterprise-scale open data management service which provides governance for Hadoop and the entire enterprise data ecosystem.
This tutorial describes how to set up Apache Atlas with YugabyteDB and run the quick start provided by the Atlas service.
Prerequisites
To use Apache Atlas, ensure that you have the following:
-
YugabyteDB up and running. Download and install YugabyteDB by following the steps in Quick start.
-
Apache Solr 5.5.1 installed. Solr is an open-source indexing platform that serves as an indexing backend to run Apache Atlas.
Build the Apache Atlas Project
To get the Apache Atlas server file, you need to build the Apache Atlas source using the following steps:
-
Clone the source from GitHub to your local setup. Checkout to the latest stable release tag (for example, release-2.3.0) and follow the steps in the README to build the files.
-
After the source files are packaged, the Atlas server tar should be available in the
distro/target
folder. -
Unzip the tar file using the following command:
tar -xzvf apache-atlas-2.3.0-server.tar.gz
Run Apache Atlas
Perform the following steps to run the Atlas server:
-
Start Solr in SolrCloud mode. Refer to Getting started with SolrCloud. When prompted for the number of nodes, enter
1
and choose the default options for the other questions. -
After SolrCloud is started, create a few configuration sets using the following commands from the Solr home directory.
bin/solr create -c vertex_index -shards 2 \ -replicationFactor 2 bin/solr create -c edge_index -shards 2 \ -replicationFactor 2 bin/solr create -c fulltext_index -shards 2 \ -replicationFactor 2
-
From the unzipped Atlas home directory, modify the configurations in
conf/atlas-application.properties
file to use YugabyteDB YCQL as the graph backend as follows:# Graph storage atlas.graph.storage.backend=cql atlas.graph.storage.username=cassandra atlas.graph.storage.password=cassandra atlas.graph.storage.hostname=localhost atlas.graph.storage.cassandra.keyspace=JanusGraph atlas.graph.storage.clustername=cassandra atlas.graph.storage.port=9042 atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.CassandraBasedAuditRepository #Comment the following Hbase specific storage properties properties #Hbase #atlas.graph.storage.hostname=localhost #atlas.graph.storage.hbase.regions-per-server=1
-
Change the
atlas.graph.index.search.solr.zookeeper-url
in theconf/atlas-application.properties
file to point to ZooKeeper started by Solr. The default value islocalhost:9983
. An example Solr URL is as follows:#Solr #Solr cloud mode properties atlas.graph.index.search.solr.zookeeper-url=localhost:9983
-
Start the Atlas Server from the Atlas home directory as follows:
bin/atlas_start.py
You should see the following output:
Starting Atlas server on host: localhost Starting Atlas server on port: 21000 ......................... Apache Atlas Server started!!!
-
To ensure the server has started successfully, run the following command:
# The default username and password for atlas is admin curl -u username:password http://localhost:21000/api/atlas/admin/version
You should see the following output:
{"Description":"Metadata Management and Data Governance Platform over Hadoop", "Revision":"4cd215e1e2a04acbcd8afe6af95f43c4979202f1","Version":"2.3.0","Name":"apache-atlas"}
-
Run the quick start script using the following command:
bin/quick_start.py
When prompted for a username and password, enter
admin
. After the script completes, you should see the following output:Creating sample types: Created type [DB] Created type [Table] . Created type [Table_Columns] Created type [Table_StorageDesc] Creating sample entities: Created entity of type [DB], guid: f87936f1-d620-4f70-88c1-471f30e95c68 . Created entity of type [LoadProcess], guid: c4c0e468-5af9-474f-acc1-088747ebf199 Created entity of type [LoadProcessExecution], guid: a03dd462-437d-42ee-b30f-dd0b8f1e9413 Created entity of type [LoadProcessExecution], guid: 2fe2e4c2-04ac-4c34-9a14-67436428949d Sample DSL Queries: query [from DB] returned [3] rows. query [DB] returned [3] rows. . query [from DataSet] returned [10] rows. query [from Process] returned [3] rows. Sample Lineage Info: loadSalesMonthly(LoadProcess) -> sales_fact_monthly_mv(Table) time_dim(Table) -> loadSalesDaily(LoadProcess) sales_fact_daily_mv(Table) -> loadSalesMonthly(LoadProcess) sales_fact(Table) -> loadSalesDaily(LoadProcess) loadSalesDaily(LoadProcess) -> sales_fact_daily_mv(Table) Sample data added to Apache Atlas Server.
-
You can verify that the
janusgraph
andatlas_audit
keyspaces were created using the ycqlsh shell as follows:ycqlsh> DESC KEYSPACES;
You should see the following output:
atlas_audit system_auth system_schema janusgraph system
Clean up
You can stop the Atlas server using the following command:
bin/atlas_stop.py
You can stop SolrCloud using the following command:
bin/solr stop -all