SpringBoot redis集群拓扑的坑

今天在测试redis节点主备切换时出现了大量连接超时的情况,正常情况下redis客户端连接到任意可用server入口节点都可以获取到整个集群的拓扑,一旦集群节点有变动也会迅速刷新拓扑,保证高可用性。

测试集群共有4主,4备,经检查发现4台备节点被运维迁移了,程序使用的一半是老地址,但是这也无法解释超时的问题(上面说过,成功连接到一台就可以获取整个集群的拓扑)。

经过调试发现,SpringBoot2默认使用Lettuce 作为redis客户端,在默认配置下关闭了拓扑更新功能。。。 上面说过,超时是在主从切换的时候出现的,也就是之前的从节点全部失效,主节点又变为了从节点,而Lettuce为了保证一致性默认只从主节点读写数据(开启从节点读可以参考这里),而拓扑没有更新导致读写主节点(其实已经是从节点了)重定向到当前拓扑不存在的节点,Lecttuce无法使用非集群中的节点,导致超时

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class ClusterTopologyRefreshOptions {

public static final boolean DEFAULT_PERIODIC_REFRESH_ENABLED = false; //定时更新关闭
public static final long DEFAULT_REFRESH_PERIOD = 60;
public static final TimeUnit DEFAULT_REFRESH_PERIOD_UNIT = TimeUnit.SECONDS;
public static final Duration DEFAULT_REFRESH_PERIOD_DURATION = Duration.ofSeconds(DEFAULT_REFRESH_PERIOD);
public static final boolean DEFAULT_DYNAMIC_REFRESH_SOURCES = true;
public static final Set<RefreshTrigger> DEFAULT_ADAPTIVE_REFRESH_TRIGGERS = Collections.emptySet(); //关闭自适应更新
public static final long DEFAULT_ADAPTIVE_REFRESH_TIMEOUT = 30;
public static final TimeUnit DEFAULT_ADAPTIVE_REFRESH_TIMEOUT_UNIT = TimeUnit.SECONDS;
public static final Duration DEFAULT_ADAPTIVE_REFRESH_TIMEOUT_DURATION = Duration
.ofSeconds(DEFAULT_ADAPTIVE_REFRESH_TIMEOUT);
public static final int DEFAULT_REFRESH_TRIGGERS_RECONNECT_ATTEMPTS = 5;
public static final boolean DEFAULT_CLOSE_STALE_CONNECTIONS = true;

private final boolean periodicRefreshEnabled;
private final Duration refreshPeriod;
private final boolean closeStaleConnections;
private final boolean dynamicRefreshSources;
private final Set<RefreshTrigger> adaptiveRefreshTriggers;
private final Duration adaptiveRefreshTimeout;
private final int refreshTriggersReconnectAttempts;
}

上github找了一圈,发现Lettuce 的作者已经提了issue:

https://github.com/spring-projects/spring-boot/issues/15630

目前可以通过手动配置的方式解决这个问题,后续社区会加入对应的配置,目前可以通过如下代码解决:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import io.lettuce.core.cluster.ClusterClientOptions;
import io.lettuce.core.cluster.ClusterTopologyRefreshOptions;
import org.springframework.boot.autoconfigure.data.redis.LettuceClientConfigurationBuilderCustomizer;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.lettuce.LettuceClientConfiguration;

@Configuration
public class RedisClusterCustomizer implements LettuceClientConfigurationBuilderCustomizer {

/**
* 开启动态redis拓扑刷新
* @param clientConfigurationBuilder
*/
@Override
public void customize(LettuceClientConfiguration.LettuceClientConfigurationBuilder clientConfigurationBuilder) {
clientConfigurationBuilder.clientOptions(ClusterClientOptions.builder().
topologyRefreshOptions(ClusterTopologyRefreshOptions.builder().enableAllAdaptiveRefreshTriggers().
enablePeriodicRefresh(true).dynamicRefreshSources(true).build()).build());
}

}

代码开启了定时拓扑更新自适应更新,自适应更新会在接收到:MOVED_REDIRECTASK_REDIRECTPERSISTENT_RECONNECTSUNKNOWN_NODE事件后触发更新