也谈Paxos

Paxos算法用来解决什么问题?我理解的用抽象一点的语言解释,paxos算法的过程就是多方统一认知,达成共识,形成决议的过程;用算法本身的形式化运行过程解释,就是多个节点如何选定某个确定的值(一致性);分布式系统中就可以做到保障高可用性(容忍少数失败或故障)的同时,达成一个一致决议,就可以应用于这些场景:多副本一致性、数据库日志强一致同步(节点日志的顺序确定)、选主(谁成为主)、成员管理或配置变更等。

做系统设计很多时候会尽量避免单点,系统整体能够容错,如单机机械故障、机器或进程重启、网络故障等,尤其针对有状态服务,这时常常会祭出服务分布式化的概念,谈到分布式问题又不可避免要讲CAP理论,在可用性A和一致性C之间作折中,而Paxos算法是分布式一致性协议的基石,它本质上也是trade-offs,需要权衡系统中参与节点数、网络消息的延迟、参与者的活动水平(如消息数量、可容忍的错误类型等)等。Paxos原始论文(by Leslie Lamport)1998年就已发表,囿于其描述和证明晦涩难懂,本篇根据好懂的多的论文Paxos Made Simple来阐释算法过程。

先看几个论文中的基本概念:
系统中的三个重要角色:Proposer、Acceptor、Learner,其中核心算法主要涉及前面两者,一个节点可以担任多个角色,多个节点就如何选定一个确定的值进行协商达成决议。
由Proposer(s)发起提议,Acceptors参与表决,形成多数派应答后,Acceptors执行由Proposer提交的决议。
在非拜占庭(Non-Byzantine)通信的环境下,所有节点都可以发起提议,也都可以发起表决,也都遵守承诺规则,不能说谎。
为容易形成多数派应答,要求分布式系统中有2F+1个节点。

关于Proposer提议编号的唯一性及递增,可将编号设定为高位为时间戳,低位为节点IP。

再来看论文中的两个非常重要的Request:
Prepare Request :
A proposer chooses a new proposal number n and sends a request to each member of some set of acceptors, asking it to respond with:
(a) A promise never again to accept a proposal numbered less than n, and
(b) The proposal with the highest number less than n that it has accepted, if any. I will call such a request a prepare request with number n.
Accept Request :
A proposer issues a proposal by sending, to some set of acceptors, a request that the proposal be accepted. (This need not be the same set of acceptors that responded to the initial requests.) Let’s call this an accept request.

Paxos原始算法实际是一个两阶段提交的算法,但不是一个同步阻塞型的2PC协议。

Phase 1
(a) A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors.
(b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

(b)承诺不再应答Proposal编号小于等于(<=)当前请求的Prepare Request。
Acceptor需要将应答过的最大的提议编号(Prepare Request)记录到磁盘上。

Phase 2
(a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.
(b) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

(b)承诺不再应答Proposal编号小于(<)当前请求的Accept Request。
Acceptor需要将已接受的最大提议(包括编号和对应value)记录到磁盘上。

用形式化的伪代码来看两阶段中的Acceptor方,假定Proposer方发起的提议编号为proposalID,决议值为proposalValue,Acceptor方记录的已经应答的最大提议编号为maxResponsedPID,已接受的最大提议编号为acceptedPID,对应值为acceptedValue,则:

P1b:
    if (proposalID > maxResponsedPID) {
        maxResponsedPID = proposalID;
        ack(acceptedPID, acceptedValue);
    }
    else {
        reject;
    }
P2b:
    if (proposalID >= maxResponsedPID) {
        acceptedPID = proposalId;
        acceptedValue = proposalValue;
        ack(ok);
    }
    else {
        reject;
    }

上述的Paxos两阶段过程也被称为Basic Paxos。
由上面的伪代码也可以看出,Basic Paxos存在着活锁的问题,即可能存在两个Proposer交替发起提议和提交决议,但都accept失败。

Refer:
1. Paxos Made Simple
2. https://en.wikipedia.org/wiki/Paxos_(computer_science)
3. Paxos理论介绍(1): 朴素Paxos算法理论推导与证明
4. PaxosRaft 分布式一致性算法原理剖析及其在实战中的应用
5. [Paxos三部曲之一] 使用Basic-Paxos协议的日志同步与恢复
6. 微信自研生产级paxos类库PhxPaxos实现原理介绍
7. Implementing Replicated Logs with Paxos

发表评论

电子邮件地址不会被公开。 必填项已用*标注