也谈Paxos

Paxos算法用来解决什么问题？我理解的用抽象一点的语言解释，paxos算法的过程就是多方统一认知，达成共识，形成决议的过程；用算法本身的形式化运行过程解释，就是多个节点如何选定某个确定的值（一致性）；分布式系统中就可以做到保障高可用性（容忍少数失败或故障）的同时，达成一个一致决议，就可以应用于这些场景：多副本一致性、数据库日志强一致同步（节点日志的顺序确定）、选主（谁成为主）、成员管理或配置变更等。

做系统设计很多时候会尽量避免单点，系统整体能够容错，如单机机械故障、机器或进程重启、网络故障等，尤其针对有状态服务，这时常常会祭出服务分布式化的概念，谈到分布式问题又不可避免要讲CAP理论，在可用性A和一致性C之间作折中，而Paxos算法是分布式一致性协议的基石，它本质上也是trade-offs，需要权衡系统中参与节点数、网络消息的延迟、参与者的活动水平（如消息数量、可容忍的错误类型等）等。Paxos原始论文（by Leslie Lamport）1998年就已发表，囿于其描述和证明晦涩难懂，本篇根据好懂的多的论文Paxos Made Simple来阐释算法过程。

先看几个论文中的基本概念：
系统中的三个重要角色：Proposer、Acceptor、Learner，其中核心算法主要涉及前面两者，一个节点可以担任多个角色，多个节点就如何选定一个确定的值进行协商达成决议。
由Proposer(s)发起提议，Acceptors参与表决，形成多数派应答后，Acceptors执行由Proposer提交的决议。
在非拜占庭（Non-Byzantine）通信的环境下，所有节点都可以发起提议，也都可以发起表决，也都遵守承诺规则，不能说谎。
为容易形成多数派应答，要求分布式系统中有2F+1个节点。

关于Proposer提议编号的唯一性及递增，可将编号设定为高位为时间戳，低位为节点IP。

再来看论文中的两个非常重要的Request：
Prepare Request :
A proposer chooses a new proposal number n and sends a request to each member of some set of acceptors, asking it to respond with:
(a) A promise never again to accept a proposal numbered less than n, and
(b) The proposal with the highest number less than n that it has accepted, if any. I will call such a request a prepare request with number n.
Accept Request :
A proposer issues a proposal by sending, to some set of acceptors, a request that the proposal be accepted. (This need not be the same set of acceptors that responded to the initial requests.) Let’s call this an accept request.

Paxos原始算法实际是一个两阶段提交的算法，但不是一个同步阻塞型的2PC协议。

Phase 1
(a) A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors.
(b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

(b)承诺不再应答Proposal编号小于等于(<=)当前请求的Prepare Request。
Acceptor需要将应答过的最大的提议编号（Prepare Request）记录到磁盘上。

Phase 2
(a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.
(b) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

(b)承诺不再应答Proposal编号小于(<)当前请求的Accept Request。
Acceptor需要将已接受的最大提议（包括编号和对应value）记录到磁盘上。

用形式化的伪代码来看两阶段中的Acceptor方，假定Proposer方发起的提议编号为proposalID，决议值为proposalValue，Acceptor方记录的已经应答的最大提议编号为maxResponsedPID，已接受的最大提议编号为acceptedPID，对应值为acceptedValue，则：

P1b:
    if (proposalID > maxResponsedPID) {
        maxResponsedPID = proposalID;
        ack(acceptedPID, acceptedValue);
    }
    else {
        reject;
    }
P2b:
    if (proposalID >= maxResponsedPID) {
        acceptedPID = proposalId;
        acceptedValue = proposalValue;
        ack(ok);
    }
    else {
        reject;
    }

上述的Paxos两阶段过程也被称为Basic Paxos。
由上面的伪代码也可以看出，Basic Paxos存在着活锁的问题，即可能存在两个Proposer交替发起提议和提交决议，但都accept失败。

Refer：
1. Paxos Made Simple
2. https://en.wikipedia.org/wiki/Paxos_(computer_science)
3. Paxos理论介绍(1): 朴素Paxos算法理论推导与证明
4. PaxosRaft 分布式一致性算法原理剖析及其在实战中的应用
5. [Paxos三部曲之一] 使用Basic-Paxos协议的日志同步与恢复
6. 微信自研生产级paxos类库PhxPaxos实现原理介绍
7. Implementing Replicated Logs with Paxos

PandaDemo

The New Beginning

发表回复取消回复

发表回复 取消回复

发表回复取消回复