KSQL server failed to start after upgrade

July 11, 2019

kafka, KSQL, upgrade

I just upgraded a small Confluent Kafka cluster at a customer. Almost everything went fine, just following these instructions: https://docs.confluent.io/current/installation/upgrade.html (our servers are running on CentOS). But at the end KSQL server wouldn’t start and failed with this error message:

[2019-07-11 16:23:31,233] ERROR Failed to start KSQL (io.confluent.ksql.rest.server.KsqlServerMain:53)
java.lang.IllegalStateException: Invalid replcation factor on topic _confluent-ksql-default__command_topic: 1
        at io.confluent.ksql.rest.util.KsqlInternalTopicUtils.ensureTopic(KsqlInternalTopicUtils.java:91)
        at io.confluent.ksql.rest.server.KsqlRestApplication.initialize(KsqlRestApplication.java:237)
        at io.confluent.ksql.rest.server.KsqlRestApplication.startKsql(KsqlRestApplication.java:200)
        at io.confluent.ksql.rest.server.KsqlRestApplication.start(KsqlRestApplication.java:187)
        at io.confluent.ksql.rest.server.KsqlServerMain.tryStartApp(KsqlServerMain.java:65)
        at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:51)

I checked the replication factor with:

kafka-topics --zookeeper :2181 --topic _confluent-ksql-default__command_topic --describe

and the replication factor from the stack trace was confirmed. Since this small cluster has three nodes and we use replication factor 3 I figured that was the best guess for correct replication factor.

You can change the replication factor using the instructions here: https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor

So, what I did was to create a JSON-file with:

{"version":1,
"partitions":[{"topic":"_confluent-ksql-default__command_topic","partition":0,"replicas":[0,2,3]}]}

The numbers at the end (0, 2, 3) are the broker-ids. (For a reason I can’t remember we don’t have a broker #1). Now, execute this command:

kafka-reassign-partitions --zookeeper localhost:2181 --execute --reassignment-json-file increase-replication-factor.json

You can verify success with:

kafka-reassign-partitions --zookeeper localhost:2181 --verify --reassignment-json-file increase-replication-factor.json

After this the KSQL server started without problems.