您的位置 首页 大数据运维

hive 入库延迟排查

hive 入库延迟排查

SRC 层分钟表 hive 入库延迟排查

一、问题发现

2019-04-27 上午张 xxx 告知张 xxx,XXLL 分钟表 load 存在异常,平台支撑组牵头排查 XXLLSRC 层分钟表 hive 入库延迟问题。

排查时间:2019 年 4 月 27 日 13:30 ~ 20:30。
相关人员:平台支撑组、平台运维组等。
问题排查
Hive 错误日志
异常出现次数:大概 595 次

异常信息:

2019-04-27 01:38:41,484 ERROR ZooKeeperHiveLockManager
(SessionState.java:printError(920)) - Unable to acquire IMPLICIT,
EXCLUSIVE lock lf_xl_src@src_c_sa_basic_normal after 100 attempts. 2019-04-27 01:38:41,506 ERROR ql.Driver
(SessionState.java:printError(920)) - FAILED: Error in acquiring locks:
Locks on the underlying objects cannot be acquired. retry after some
time
org.apache.hadoop.hive.ql.lockmgr.LockException: Locks on the
underlying objects cannot be acquired. retry after some time
at
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(
DummyTxnManager.java:164)
at
org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:
988)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1224)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:20
9)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:161)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:372)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:307)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:704)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorI
mpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod
AccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

Hive 锁

查看锁:

 

锁拥有者:

 

show locks src_c_sa_basic_normal extended;

 

> show locks src_c_sa_basic_normal extended;
OK
lf_xl_src@src_c_sa_basic_normal SHARED
LOCK_QUERYID:lf_xl_bp_20190427164141_acb414ff-d237-414b-a778-5
ef916724c04
LOCK_TIME:1524818467680
LOCK_MODE:IMPLICIT
LOCK_QUERYSTRING:insert overwrite table lf_xl_dwd.dwd_d_sa_basic_n
ormal_hour partition(month_id='201904',day_id='27',hour_id='15',
prov_id='075',sa_type)
select msisdn ,
imsi ,
start_time

 

hive 目前主要有两种锁,SHARED(共享锁 S)和 Exclusive(排他锁 X)。共享锁 S 和 排他锁 X 它们之间的兼容性矩阵关系如下:

Zookeeper 日志

Hive 获取 lock 失败:

日志条数:

Znode 结构:

 

进程堆栈

进程堆栈信息

"main" prio=10 tid=0x00007fdf7c012800 nid=0x772f waiting on condi
tion [0x00007fdf85b23000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveL
ockManager.lock(ZooKeeperHiveLockManager.java:268)
at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveL
ockManager.lock(ZooKeeperHiveLockManager.java:182)
at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquir
eLocks(DummyTxnManager.java:161)
at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Dri
ver.java:988)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1224)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver

三、目前解决方案

取消 load 操作,改为 cp 操作,检测 hdfs 文件修复 hive 分区

四、其他问题

强制杀死 kafka 消费者

每秒中周期提交 offset

欢迎来撩 : 汇总all

白眉大叔

关于白眉大叔linux云计算: 白眉大叔

热门文章