Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
87 views
in Technique[技术] by (71.8m points)

Dealing with data loss in kafka connect

I understood that Kafka connect can be deployed in cluster mode. And workers move data between data source and kafka topic. What I want to know is if a worker fails when moving data between data source to kafka topic would there be a dataloss? If there would be a dataloss how can we get the data back from the connector or will kafka connect automatically deal with it?

question from:https://stackoverflow.com/questions/65846981/dealing-with-data-loss-in-kafka-connect

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This depends on the source and if it supports offset tracking.

For example, lines in a file, rows in a database with a primary ID / timestamp, or some idenpotent API call can be repeatedly called and get the same starting position. (although, in each case, the underlying data also needs to be immutable for it to work consistently)

Kafka Connect SourceTask API has a call to commit tracked "offsets" (different from Kafka topic offsets)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...