服务器强制重启后,有时Kafka等会保存checkpoint,但是断点信息出现错误。启动Kafka服务时,出现如下错误:

ERROR Error while reading checkpoint file /home/kafka-2.3.1/kafka-logs/recovery-point-offset-checkpoint (kafka.server.LogDirFailureChannel)

java.io.IOException: Malformed line in checkpoint file (/home/kafka-2.3.1/kafka-logs/recovery-point-offset-checkpoint): '

以及:

[2022-12-27 14:49:16,226] ERROR Error while reading checkpoint file /home/kafka-2.3.1/kafka-logs/replication-offset-checkpoint (kafka.server.LogDirFailureChannel)

java.io.IOException: Malformed line in checkpoint file (/home/kafka-2.3.1/kafka-logs/replication-offset-checkpoint): '

at kafka.server.checkpoints.CheckpointFile.malformedLineException$1(CheckpointFile.scala:84)

at kafka.server.checkpoints.CheckpointFile.liftedTree2$1(CheckpointFile.scala:107)

at kafka.server.checkpoints.CheckpointFile.read(CheckpointFile.scala:86)

at kafka.server.checkpoints.OffsetCheckpointFile.read(OffsetCheckpointFile.scala:61)

at kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:222)

at kafka.cluster.Partition$$anonfun$getOrCreateReplica$1.apply(Partition.scala:216)

at kafka.utils.Pool$$anon$2.apply(Pool.scala:61)

at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)

at kafka.utils.Pool.getAndMaybePut(Pool.scala:60)

at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:215)

at kafka.server.ReplicaManager$$anonfun$makeFollowers$3.apply(ReplicaManager.scala:1304)

at kafka.server.ReplicaManager$$anonfun$makeFollowers$3.apply(ReplicaManager.scala:1281)

at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)

at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)

at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)

at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)

at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)

at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:1281)

at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:1119)

at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:201)

at kafka.server.KafkaApis.handle(KafkaApis.scala:117)

at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)

at java.lang.Thread.run(Thread.java:748)

为避免丢失数据,不要删除日志或者断点文件后重启,达到服务正常启动的目的。针对于本问题的解决方法如下:

分别找到上面所指的断点和分区文件,打开文件,会发现文章末尾或某些位置有未知的符号。

出错的文件在文件末尾有:@^等蓝色符号,或者不规范的符号和数据

对比其他服务器上的该文件,可以删除掉这些符号。然后Kafka服务即可重启。

正常replication-offset-checkpoint文件如下:

0

6

test 0 81812

testReceiver-session 0 108

app0-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition 0 0

test 1 393337

simpleTest 0 48649

shareplex 0 10

正常recovery-point-offset-checkpoint文件如下:

0

33

dept 1 0

__consumer_offsets 30 241

__consumer_offsets 21 0

qkgj 2 0

bar 5 0

__consumer_offsets 27 3

__consumer_offsets 9 33

bar 1 0

__consumer_offsets 33 189

zyqw 2 0

Partitions-3-test 1 86580

test 0 81812

bar 6 0

__consumer_offsets 36 9

__consumer_offsets 42 1138648

bar 0 0

__consumer_offsets 3 9355

__consumer_offsets 18 4011

__consumer_offsets 15 86

__consumer_offsets 24 4282

testReceiver-session 0 108

app0-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition 0 0

__consumer_offsets 48 354426

test 1 393337

bar 7 0

__consumer_offsets 6 42071

bar 4 0

__consumer_offsets 0 37343

__consumer_offsets 39 142047

__consumer_offsets 12 97

__consumer_offsets 45 148239

simpleTest 0 48649

share 0 10

精彩文章

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: