专业网站开发服务,青岛网络推广公司排名,phpcms网站转移,制作音乐的软件app首先确保环境的干净#xff0c;如果之前有安装过清理掉相关残留
确保安装atlas的服务器有足够的内存#xff08;至少16G#xff09;#xff0c;有必要的hadoop角色
HDFS客户端 — 检索和更新Hadoop使用的用户组信息#xff08;UGI#xff09;中帐户成员资格的信息。对调…
首先确保环境的干净如果之前有安装过清理掉相关残留
确保安装atlas的服务器有足够的内存至少16G有必要的hadoop角色
HDFS客户端 — 检索和更新Hadoop使用的用户组信息UGI中帐户成员资格的信息。对调试很有用。HBase Client - Atlas 存储其 Janus 数据库用于初始导入 HBase 内容因此它需要持续访问 HBase 服务中的两个表。Hive 客户端 - 用于初始导入 Hive 内容。
准备编译环境
mvn3.8.8 必须3.8以上的版本 3.6无法编译
java 1.8.0_181 跟你的CDH环境保持一致
node node-v16.20.2
下载和解压缩源代码
该项目的网站可以在这里找到 Apache Atlas – Data Governance and Metadata framework for Hadoop
查找并下载 Apache Atlas
更改pom.xml
在主pom就是文件夹打开第一个添加一个包含 maven 工件的 clouder 存储库
repositoryidcloudera/idurlhttps://repository.cloudera.com/artifactory/cloudera-repos/urlreleasesenabledtrue/enabled/releasessnapshotsenabledfalse/enabled/snapshots
/repository然后修改对应的cdh组件版本
hadoop.version3.0.0-cdh6.3.2/hadoop.version
hbase.version2.1.0-cdh6.3.2/hbase.version
hive.version2.1.1-cdh6.3.2/hive.version
kafka.scala.binary.version2.11/kafka.scala.binary.version
kafka.version2.2.1-cdh6.3.2/kafka.version
solr-test-framework.version7.4.0-cdh6.3.2/solr-test-framework.version
lucene-solr.version7.4.0/lucene-solr.version
solr.version7.4.0-cdh6.3.2/solr.version
sqoop.version1.4.7-cdh6.3.2/sqoop.version
zookeeper.version3.4.5-cdh6.3.2/zookeeper.version然后修改一些jar包的版本
将“atlas-buildtools”工件的版本从“1.0”更改为“0.8.1”dependencygroupIdorg.apache.atlas/groupIdartifactIdatlas-buildtools/artifactIdversion0.8.1/version/dependency修改jsr.version为2.0.1jsr.version2.0.1/jsr.version
修改一些次pom
主目录下
grep -rn jsr311-apii | grep pom.xmladdons/impala-bridge/pom.xml:332
addons/falcon-bridge/pom.xml:178
addons/hive-bridge/pom.xml:312:
addons/hbase-bridge/pom.xml:345:
addons/storm-bridge/pom.xml:360:
addons/sqoop-bridge/pom.xml:250:这几个pom中jsr311-api改成javax.ws.rs-api修改其他文件 在文件
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
中转到第618行
注释”String catalogName hiveDB.getCatalogName null hiveDB.getCatalogName.toLowerCase null;“
并添加 ”String catalogName null;“
public static String getDatabaseName(Database hiveDB) {String dbName hiveDB.getName().toLowerCase();//String catalogName hiveDB.getCatalogName() ! null ? hiveDB.getCatalogName().toLowerCase() : null;String catalogName null;if (StringUtils.isNotEmpty(catalogName) !StringUtils.equals(catalogName, DEFAULT_METASTORE_CATALOG)) {dbName catalogName SEP dbName;}return dbName;
}在文件
addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java
中
转到第83行”this.metastoreHandler listenerEvent null metastoreEvent.getIHMSHandler null;“
注释它并添加”this.metastoreHandler null;“
public AtlasHiveHookContext(HiveHook hook, HiveOperation hiveOperation, HookContext hiveContext, HiveHookObjectNamesCache knownObjects,HiveMetastoreHook metastoreHook, ListenerEvent listenerEvent) throws Exception {this.hook hook;this.hiveOperation hiveOperation;this.hiveContext hiveContext;this.hive hiveContext ! null ? Hive.get(hiveContext.getConf()) : null;this.knownObjects knownObjects;this.metastoreHook metastoreHook;this.metastoreEvent listenerEvent;//this.metastoreHandler (listenerEvent ! null) ? metastoreEvent.getIHMSHandler() : null;this.metastoreHandler null;init();
}在文件addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
注释第 293 行提到“MATERIALIZED_VIEW”
private boolean isDdlOperation(AtlasEntity entity) {return entity ! null !context.isMetastoreHook() (context.getHiveOperation().equals(HiveOperation.CREATETABLE_AS_SELECT)|| context.getHiveOperation().equals(HiveOperation.CREATEVIEW)|| context.getHiveOperation().equals(HiveOperation.ALTERVIEW_AS));//|| context.getHiveOperation().equals(HiveOperation.CREATE_MATERIALIZED_VIEW));
}注意这里要加号因为原来的符号被注释了 在文件addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java
注释提及“MATERIALIZED_VIEW”的第 212 行和第 217 行
开始构建。基本无坑。有问题多试几次。有时候会因为网络问题下不到包
mvn clean -DskipTests package -Pdist -Drat.skiptrue
包在distro/target/apache-atlas-2.2.0-bin.tar.gz
不要用官方文档说的server包 那个包没有各种hook文件
解压到安装目录开始安装 为atlas部署准备 CDH 集群服务
Atlas使用HBase来存储他的Janus数据库。Solr 用于存储和搜索审核日志。Kafka被用作从Atlas库即嵌入Hadoop服务中的钩子到Atlas本身的消息发送器。
1.1. 在 HBase 中创建必要的表
在 Atlas 计算机或安装了“HBase 网关”角色的任何其他计算机上创建必要的表
TABLE1apache_atlas_entity_audit
TABLE2apache_atlas_janusecho create ${TABLE1}, dt | hbase shell
echo create ${TABLE2}, s | hbase shell检查已创建的表 在 Atlas 计算机或安装了“HBase 网关”角色的任何其他计算机上执行 echo list | hbase shell 复制 标准输出 Took 0.0028 seconds
list
TABLE
apache_atlas_entity_audit
apache_atlas_janus
2 row(s)
Took 0.6872 seconds
[apache_atlas_entity_audit, apache_atlas_janus]添加hbase集群配置文件到conf/hbase下
ln -s /etc/hbase/conf/ /data/apache-atlas-2.2.0/conf/hbase Apache Kafka
Atlas 使用 Apache Kafka 接收有关 Hadoop 服务中发生的事件的消息。消息是使用嵌入在某些服务中的Atlas的特殊库发送的。目前Atlas 读取有关 Hbase 和 Hive 中事件的消息例如创建和删除表、添加列、等等等等......
在 Kafka 中添加必要的topic Apache Atlas 需要 Apache Kafka 中的三个topic。在安装了 Kafka 的计算机上创建它们
kafka-topics --zookeeper S0:2181,S1:2181,S2:2181,S3:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics --zookeeper S0:2181,S1:2181,S2:2181,S3:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics --zookeeper S0:2181,S1:2181,S2:2181,S3:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
有kerberos的会麻烦一点 具体看这篇
Kerberos环境下 命令行连接kafka 和zk_启用kerberos后zk_Mumunu-的博客-CSDN博客 配置atlas的sentry role 以访问kafka topic
在具有“Kafka 网关”和“sentry网关”角色的机器上在sentry中创建“kafka4atlas_role”角色
KROLEkafka4atlas_rolekafka-sentry -cr -r ${KROLE}
将创建的角色分配给 atlas 组kafka-sentry -arg -r ${KROLE} -g atlas为消费者分配权限
TOPIC1_HOATLASOK
TOPIC2ATLAS_ENTITIES
TOPIC3ATLAS_HOOKkafka-sentry -gpr -r ${KROLE} -p Host*-CONSUMERGROUP*-actionread
kafka-sentry -gpr -r ${KROLE} -p Host*-CONSUMERGROUP*-actiondescribekafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC1}-actionread
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC2}-actionread
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC3}-actionread
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC1}-actiondescribe
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC2}-actiondescribe
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC3}-actiondescribe
为生产者分配权限
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC1}-actionwrite
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC2}-actionwrite
kafka-sentry -gpr -r ${KROLE} -p HOST*-TOPIC${TOPIC3}-actionwrite
检查sentry设置$ kafka-sentry -lr
....
solradm_role
kafka4atlas_role
显示组及其分配角色的列表$ kafka-sentry -lg
...
atlas kafka4atlas_role
test2_solr_admins solradm_role
显示权限列表$ kafka-sentry -lp -r kafka4atlas_role
...
HOST*-TOPIC_HOATLASOK-actionread
HOST*-TOPIC_HOATLASOK-actiondescribe
HOST*-TOPICATLAS_HOOK-actionread
HOST*-TOPICATLAS_ENTITIES-actiondescribe
HOST*-TOPICATLAS_HOOK-actiondescribe
HOST*-CONSUMERGROUP*-actiondescribe
HOST*-TOPIC_HOATLASOK-actionwrite
HOST*-TOPICATLAS_ENTITIES-actionwrite
HOST*-TOPICATLAS_HOOK-actionwrite
HOST*-TOPICATLAS_ENTITIES-actionread
HOST*-CONSUMERGROUP*-actionread
集成CDH的Solr ①将apache-atlas-2.1.0/conf/solr文件拷贝到solr的安装目录下即/opt/cloudera/parcels/CDh/lib/solr下然后更名为atlas-solr
②创建collection
vi /etc/passwd
/sbin/nologin 修改为 /bin/bash
su - solr/opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2/opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2/opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2 ③验证创建collection成功 登录 solr web控制台 http://xxxx:8983 验证是否启动成功
创建好相关的kerberos帐号和keytab
修改atlas-application.properties
######### Graph Database Configs ########## Graph Database#Configures the graph database to use. Defaults to JanusGraph
#atlas.graphdb.backendorg.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various storage backends.
#
atlas.graph.storage.backendhbase
atlas.graph.storage.hbase.tableapache_atlas_janus#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostnameS0:2181,S1:2181,S2:2181
atlas.graph.storage.hbase.regions-per-server1
atlas.graph.stoorage.lock.wait-time10000#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername
#atlas.graph.storage.port# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabledtrue# Delete handler
#
# This allows the default behavior of doing soft deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are soft deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are hard deletes
#
#atlas.DeleteHandlerV1.implorg.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.implorg.apache.atlas.repository.audit.HBaseBasedAuditRepository# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspaceatlas_audit
# atlas.EntityAuditRepository.replicationFactor1# Graph Search Index
atlas.graph.index.search.backendsolr#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.modecloud
atlas.graph.index.search.solr.zookeeper-urlS0:2181/solr,S1:2181/solr,S2:2181/solr
atlas.graph.index.search.solr.zookeeper-connect-timeout60000
atlas.graph.index.search.solr.zookeeper-session-timeout60000
atlas.graph.index.search.solr.wait-searchertrue#Solr http mode properties
#atlas.graph.index.search.solr.modehttp
#atlas.graph.index.search.solr.http-urlshttp://localhost:8983/solr# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostnamelocalhost
#atlas.graph.index.search.elasticsearch.client-onlytrue# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size150######### Import Configs #########
#atlas.import.temp.directory/temp/import######### Notification Configs #########
atlas.notification.embeddedfalse
atlas.kafka.data${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connectS0:2181,S1:2181,S2:2181
atlas.kafka.bootstrap.serversS0:9092,S1:9092,S2:9092
atlas.kafka.zookeeper.session.timeout.ms60000
atlas.kafka.zookeeper.connection.timeout.ms60000
atlas.kafka.zookeeper.sync.time.ms20
atlas.kafka.auto.commit.interval.ms1000
atlas.kafka.hook.group.idatlasatlas.kafka.enable.auto.commitfalse
atlas.kafka.auto.offset.resetearliest
atlas.kafka.session.timeout.ms30000
atlas.kafka.offsets.topic.replication.factor1
atlas.kafka.poll.timeout.ms1000atlas.notification.create.topicstrue
atlas.notification.replicas1
atlas.notification.topicsATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messagestrue
atlas.notification.consumer.retry.interval500
atlas.notification.hook.retry.interval1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principalkafka/_HOSTEXAMPLE.COM
#atlas.notification.kafka.keytab.location/etc/security/keytabs/kafka.service.keytab## Server port configuration
atlas.server.http.port21000
#atlas.server.https.port21443######### Security Properties ########## SSL config
atlas.enableTLSfalse#truststore.file/path/to/truststore.jks
#cert.stores.credential.provider.pathjceks://file/path/to/credentialstore.jceks#following only required for 2-way SSL
#keystore.file/path/to/keystore.jks# Authentication config
atlas.authentication.methodkerberos
atlas.authentication.keytab/data/hive.keytab
atlas.authentication.principalhiveTEST.COMatlas.authentication.method.kerberostrue
atlas.authentication.method.kerberos.principalhiveTEST.COM
atlas.authentication.method.kerberos.keytab/data/hive.keytab
atlas.authentication.method.kerberos.name.rulesRULE:[2:$1$0](hiveTEST.COM)s/.*/hive/
atlas.authentication.method.kerberos.token.validity3600#atlas.authentication.method.filetrue#### ldap.type LDAP or AD
atlas.authentication.method.ldap.typenone#### user credentials file
atlas.authentication.method.file.filename${sys:atlas.home}/conf/users-credentials.properties### groups from UGI
#atlas.authentication.method.ldap.ugi-groupstrue######## LDAP properties #########
#atlas.authentication.method.ldap.urlldap://ldap server url:389
#atlas.authentication.method.ldap.userDNpatternuid{0},ouPeople,dcexample,dccom
#atlas.authentication.method.ldap.groupSearchBasedcexample,dccom
#atlas.authentication.method.ldap.groupSearchFilter(memberuid{0},ouUsers,dcexample,dccom)
#atlas.authentication.method.ldap.groupRoleAttributecn
#atlas.authentication.method.ldap.base.dndcexample,dccom
#atlas.authentication.method.ldap.bind.dncnManager,dcexample,dccom
#atlas.authentication.method.ldap.bind.passwordpassword
#atlas.authentication.method.ldap.referralignore
#atlas.authentication.method.ldap.user.searchfilter(uid{0})
#atlas.authentication.method.ldap.default.roledefault role######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domainexample.com
#atlas.authentication.method.ldap.ad.urlldap://AD server url:389
#atlas.authentication.method.ldap.ad.base.dn(sAMAccountName{0})
#atlas.authentication.method.ldap.ad.bind.dnCNteam,CNUsers,DCexample,DCcom
#atlas.authentication.method.ldap.ad.bind.passwordpassword
#atlas.authentication.method.ldap.ad.referralignore
#atlas.authentication.method.ldap.ad.user.searchfilter(sAMAccountName{0})
#atlas.authentication.method.ldap.ad.default.roledefault role######### JAAS Configuration ########atlas.jaas.KafkaClient.loginModuleNamecom.sun.security.auth.module.Krb5LoginModule
atlas.jaas.KafkaClient.loginModuleControlFlagrequired
atlas.jaas.KafkaClient.option.useKeyTabtrue
atlas.jaas.KafkaClient.option.storeKeytrue
atlas.jaas.KafkaClient.option.serviceNamekafka
atlas.jaas.KafkaClient.option.keyTab/data/atlas.service.keytab
atlas.jaas.KafkaClient.option.principalatlas/s1.hadoop.comTEST.COMatlas.jaas.Client.loginModuleNamecom.sun.security.auth.module.Krb5LoginModule
atlas.jaas.Client.loginModuleControlFlagrequired
atlas.jaas.Client.option.useKeyTabtrue
atlas.jaas.Client.option.storeKeytrue
atlas.jaas.Client.option.keyTab/data/atlas.service.keytab
atlas.jaas.Client.option.principalatlas/s1.hadoop.comTEST.COM######### Server Properties #########
atlas.rest.addresshttp://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.startfalse######### Entity Audit Configs #########
atlas.audit.hbase.tablenameapache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms1000
atlas.audit.hbase.zookeeper.quorumS0:2181,S1:2181,S2:2181######### High Availability Configuration ########
atlas.server.ha.enabledfalse
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.idsid1
#atlas.server.address.id1localhost:21000
#atlas.server.ha.zookeeper.connectlocalhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms1000
#atlas.server.ha.zookeeper.num.retries3
#atlas.server.ha.zookeeper.session.timeout.ms20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.aclscheme:id
#atlas.server.ha.zookeeper.authscheme:authinfo######### Atlas Authorization #########
atlas.authorizer.implsimple
atlas.authorizer.simple.authz.policy.fileatlas-simple-authz-policy.json######### Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl######### Performance Configs #########
#atlas.graph.storage.lock.retries10
#atlas.graph.storage.cache.db-cache-time120000######### CSRF Configs #########
atlas.rest-csrf.enabledtrue
atlas.rest-csrf.browser-useragents-regex^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignoreGET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-headerX-XSRF-HEADER############ KNOX Configs ################
#atlas.sso.knox.browser.useragentMozilla,Chrome,Opera
#atlas.sso.knox.enabledtrue
#atlas.sso.knox.providerurlhttps://knox gateway ip:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.key.name
atlas.metric.query.cache.ttlInSecs900
#atlas.metric.query.general.typeCount
#atlas.metric.query.general.typeUnusedCount
#atlas.metric.query.general.entityCount
#atlas.metric.query.general.tagCount
#atlas.metric.query.general.entityDeleted
#
#atlas.metric.query.entity.typeEntities
#atlas.metric.query.entity.entityTagged
#
#atlas.metric.query.tags.entityTags######### Compiled Query Cache Configuration ########## The size of the compiled query cache. Older queries will be evicted from the cache
# when we reach the capacity.#atlas.CompiledQueryCache.capacity1000# Allows notifications when items are evicted from the compiled query
# cache because it has become full. A warning will be issued when
# the specified number of evictions have occurred. If the eviction
# warning threshold 0, no eviction warnings will be issued.#atlas.CompiledQueryCache.evictionWarningThrottle0######### Full Text Search Configuration ##########Set to false to disable full text search.
#atlas.search.fulltext.enabletrue######### Gremlin Search Configuration ##########Set to false to disable gremlin search.
atlas.search.gremlin.enablefalse########## Add http headers ############atlas.headers.Access-Control-Allow-Origin*
#atlas.headers.Access-Control-Allow-MethodsGET,OPTIONS,HEAD,PUT,POST
#atlas.headers.headerNameheaderValue######### UI Configuration ########atlas.ui.default.versionv1
要改的配置很多。。务必仔细核对。很多默认配置都是有问题的,keytab 新建或者复用都可以担心可能会涉及到权限问题所以我选择了hive的账户。hbase中应该也需要配置相应的权限。没测试过是否需要配置
修改atlas-env.sh
#!/usr/bin/env bash# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
export JAVA_HOME/usr/java/default
export HBASE_CONF_DIR/etc/hbase/conf
# any additional java opts you want to set. This will apply to both client and server operations
#export ATLAS_OPTS# any additional java opts that you want to set for client only
#export ATLAS_CLIENT_OPTS# java heap size we want to set for the client. Default is 1024MB
#export ATLAS_CLIENT_HEAP# any additional opts you want to set for atlas service.
#export ATLAS_SERVER_OPTS# indicative values for large number of metadata entities (equal or more than 10,000s)
export ATLAS_SERVER_OPTS-server -XX:SoftRefLRUPolicyMSPerMB0 -XX:CMSClassUnloadingEnabled -XX:UseConcMarkSweepGC -XX:CMSParallelRemarkEnabled -XX:PrintTenuringDistribution -XX:HeapDumpOnOutOfMemoryError -XX:HeapDumpPathdumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:UseGCLogFileRotation -XX:NumberOfGCLogFiles10 -XX:GCLogFileSize1m -XX:PrintGCDetails -XX:PrintHeapAtGC -XX:PrintGCTimeStamps -Djava.security.krb5.conf/etc/krb5.conf -Djava.security.auth.login.config/data/atlas2.2/conf/jaas.conf# java heap size we want to set for the atlas server. Default is 1024MB
#export ATLAS_SERVER_HEAP# indicative values for large number of metadata entities (equal or more than 10,000s) for JDK 8
export ATLAS_SERVER_HEAP-Xms15360m -Xmx15360m -XX:MaxNewSize5120m -XX:MetaspaceSize100M -XX:MaxMetaspaceSize512m# What is is considered as atlas home dir. Default is the base locaion of the installed software
export ATLAS_HOME_DIR/opt/atlas2.2# Where log files are stored. Defatult is logs directory under the base install location
#export ATLAS_LOG_DIR# Where pid files are stored. Defatult is logs directory under the base install location
#export ATLAS_PID_DIR# where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
#export ATLAS_DATA_DIR# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
#export ATLAS_EXPANDED_WEBAPP_DIR# indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASEfalse# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLRfalse# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRAfalse# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCHfalseenv中的jaas.conf 需要增加一个jaas.conf
Client {com.sun.security.auth.module.Krb5LoginModule requireduseKeyTabtrueKeyTab/data/atlas.service.keytabstoreKeytrueprincipalatlas/s1.hadoop.comTEST.COMdebugfalse;
}; 集成hive
首先去CDH的hive上添加3处配置 HiveServer2 的 Java 配置选项 {{JAVA_GC_ARGS}} -Datlas.conf/data/apache-atlas-2.2.0/conf/ hive-site.xml的HiveServer2 高级配置代码段 安全阀
名称 hive.exec.post.hooks 值 org.apache.atlas.hive.hook.HiveHook HiveServer2 环境高级配置片段安全阀
HIVE_AUX_JARS_PATH/data/apache-atlas-2.2.0/hook/hive/ 复制一份atlas-application.properties到/etc/hive/conf下 。注意需要修改
改为false
atlas.authentication.method.kerberosfalse
增加
atlas.client.readTimeoutMSecs90000
atlas.client.connectTimeoutMSecs90000
最后两个配置的含义是读取连接时间默认的太短
然后就可以启动了
bin/atlas-start.pybin/atlas-stop.py
启动过程如下图所示 该过程会耗时较久包含index创建、数据的初始化等操作可能长达数小时请耐心等待。 此时可以跟一下atlas的启动日志直到日志不再刷新再lsof或netstat查一下21000是否已经监听了如已存在则打开浏览器输入ip:21000登录atlas页面 千万不要相信他提示的Apache Atlas Server started!!!和jps显示的Atlas进程 因为启动脚本超过一定时间后一定会报成功但此时21000端口还未被监听服务是不可用的真正可用还是以21000被成功监听可以进到Atlas登录页面为准 然后开始正式使用
导入hive数据
记得kinit
bin/import-hive.sh
也可以单独导入某个库
bin/import-hive.sh -d default 过程中会提示输入atlas用户名和密码都输入admin即可 成功后会提示 该过程时间视hive现有数据量大小而定 登录后如下图 此时可以点击右上角小图标 查看总体数据情况 查看所有hive表 随便点击一个表查看详情 可以清楚地看到这个表的各项信息、字段及血缘图等 我们也可以通过左侧搜索栏检索过滤想要查找的项