Oozie Sqoop Action 配置
  • Sqoop Action 用来运行sqoop 任务,流程任务必须等当前节点的sqoop任务执行完成之后才能执行后续节点任务。
  • Email Action 所有的节点值都可以使用EL表达式
  • 运行Sqoop Job,必须在 sqoop action里面配置 job-tracer,name-node,Sqoop command,也许还需要一些参数和配置。
  • 同Shell Action一样 Sqoop Action 可以配置成创建或者删除HDFS目录之后再去执行一个Sqoop任务
  • Sqoop 应用的配置可以使用job-xml文件中的元素,也可以使用内部元素来配置,像EL表达式也支持在内部元素中的配置,内部元素的配置可以覆盖外部文件中的配置,内部元素配置不能使用 Hadoop mapred.job.tracker and fs.default.name这两个属性
  • 跟mr任务一样,在Shell任务中也可以使用文件和附件具体参见【http://archive.cloudera.com/cdh/3/oozie/WorkflowFunctionalSpec.html#a3.2.2.1_Adding_Files_and_Archives_for_the_Job

Sqoop Action格式
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="[NODE-NAME]">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>[JOB-TRACKER]</job-tracker>
            <name-node>[NAME-NODE]</name-node>
            <prepare>
               <delete path="[PATH]"/>
               ...
               <mkdir path="[PATH]"/>
               ...
            </prepare>
            <configuration>
                <property>
                    <name>[PROPERTY-NAME]</name>
                    <value>[PROPERTY-VALUE]</value>
                </property>
                ...
            </configuration>
            <command>[SQOOP-COMMAND]</command>
            <arg>[SQOOP-ARGUMENT]</arg>
            ...
            <file>[FILE-PATH]</file>
            ...
            <archive>[FILE-PATH]</archive>
            ...
        </sqoop>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>
  • prepare 元素 如果存在,表明在执行sqoop 命令之前需要执行的一系列 hdfs路径的创建和删除操作,并且路径必须以  hdfs://HOST:PORT  开头
  • job-xml 元素 如果存在,则作为sqoop任务的配置文件,从 schema 0.3开始支持多个job-xml元素用来支持多个job.xml文件
  • configuration 用来给sqoop任务传递参数
sqoop command
  • sqoop 命令可以通过command元素或者多个arg元素指定
  • 当使用command的时候,oozie会根据空格把命令切分成多个参数
  • 当使用arg的时候,oozie将会把arg里面的值当成参数传递给sqoop
  • 当一个参数里面有空格的时候,必须用arg来指定
  • 上述所有的元素值都可以使用EL表达式配置
Sqoop Action 使用实例一:oozie调用sqoop,使用sqoop同步mysql数据,执行成功发送提示邮件
1,新建 job.properties
1
2
3
4
5
6
jobTracker=hadoop-node1.novalocal:8050
queueName=default
examplesRoot=xwj_test
jobOutput=/user/xwj/test
oozie.wf.application.path=${nameNode}/user/oozie/${examplesRoot}/apps/shell/sqoop_email/workflow.xml
2,workflow.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<workflow-app xmlns="uri:oozie:workflow:0.4" name="email-wf">
    <start to="sqoop-node"/>
     
    <action name="sqoop-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>hadoop-node1.novalocal:8050</job-tracker>
            <name-node>hdfs://hadoop-node1.novalocal:8020</name-node>
            <prepare>
                <delete path="${jobOutput}"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <command>sqoop import --connect jdbc:mysql://host:3306/oozie --username oozie --password oozie --query 'select id,app_name,app_path,user_name from WF_JOBS where $CONDITIONS LIMIT 100' --target-dir /user/xwj/test --delete-target-dir --num-mappers 1 --fields-terminated-by 't'</command>
        </sqoop>
        <ok to="email-node"/>
        <error to="fail"/>
    </action>
     
     
    <action name="email-node">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>1719038657@qq.com</to>
            <cc>noway-you@qq.com</cc>
            <subject>Email notifications for ${wf:id()}</subject>
            <body>The wf ${wf:id()} successfully completed.</body>
        </email>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
   <end name='end' />
</workflow-app>
3,首先在本地的测试节点上创建文件夹
mkdir -p /opt/mydata/user/oozie/xwj_test/apps/shell/sqoop_email
4,在hdfs上创建目录 hdfs dfs -mkdir -p /user/oozie/xwj_test/apps/shell/sqoop_email
5,将上述文件上传到新建好的目录中
cd /opt/mydata/user/oozie/xwj_test/apps/shell/sqoop_email
6,将本地文件 上传到hdfs目录中
hdfs dfs -put ../sqoop_email/* /user/oozie/xwj_test/apps/shell/sqoop_email
7,查看hdfs上的目录文件是否存在
hdfs dfs -ls -r /user/oozie/xwj_test/apps/shell/sqoop_email
8,切换yarn用户重新提交任务
su yarn
oozie job -oozie http://hadoop-node0.novalocal:11000/oozie -config /opt/mydata/user/oozie/xwj_test/apps/shell/sqoop_email/job.properties -run
执行结果报错
ACTION[0000002-180412152846094-oozie-root-W@sqoop-node] Launcher exception: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2241)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:238)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2147)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2239)
... 9 more
参考链接
修改job.properties 加入
oozie.use.system.libpath=true
重新运行
结果报错
2018-04-13 09:19:00,121 WARN SqoopActionExecutor:523 - SERVER[hadoop-node0.novalocal] USER[root] GROUP[-] TOKEN[] APP[email-wf] JOB[0000010-180412152846094-oozie-root-W] ACTION[0000010-180412152846094-oozie-root-W@sqoop-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
9 ,遇到这个问题直接百度有很多人遇到相关问题,但是解决的办法很少,这里我们记录一下排查过程
9.1 首先根据 oozie启动的任务ID 到oozie界面上找到 该任务的错误详情
9.2 点击错误的节点 查看节点执行的详情日志
9.3 最终层层定位 ,终于找到错误的真正日志
java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'id,app_name,app_path,user_name from WF_JOBS where (1 = 0)' at line 1 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:536) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:115) at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1983) at com.mysql.cj.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1826) at com.mysql.cj.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1923) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:777) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:253) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:337) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.main(Sqoop.java:243) at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197) at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:179) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58) at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:240) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
9.4 通过日志信息发现是 sqoop 导入mysql的 sql 格式有问题 没有添加 select ,修正之后 重新提交 终于运行成功 (这个地方虽然是一个比较粗心的错误 ,但是通过这个错误找到排查具体日志的方法,非常重要,对于研发来说里程牌式的意义
内容来源于网络如有侵权请私信删除
你还没有登录,请先登录注册
  • 还没有人评论,欢迎说说您的想法!