溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

PostgreSQL 源碼解讀(116)- MVCC#1(獲取快照#1)

發布時間:2020-08-07 01:49:35 來源:ITPUB博客 閱讀:258 作者:husthxd 欄目:關系型數據庫

本節介紹了PostgreSQL獲取事務快照的主實現邏輯,相應的實現函數是GetTransactionSnapshot。

一、數據結構

全局/靜態變量


/*
 * Currently registered Snapshots.  Ordered in a heap by xmin, so that we can
 * quickly find the one with lowest xmin, to advance our MyPgXact->xmin.
 * 當前已注冊的快照.
 * 按照xmin堆排序,這樣我們可以快速找到xmin最小的一個,從而可以設置MyPgXact->xmin。
 */
static int xmin_cmp(const pairingheap_node *a, const pairingheap_node *b,
 void *arg);
static pairingheap RegisteredSnapshots = {&xmin_cmp, NULL, NULL};
/* first GetTransactionSnapshot call in a transaction? */
bool        FirstSnapshotSet = false;
/*
 * Remember the serializable transaction snapshot, if any.  We cannot trust
 * FirstSnapshotSet in combination with IsolationUsesXactSnapshot(), because
 * GUC may be reset before us, changing the value of IsolationUsesXactSnapshot.
 * 如存在則記下serializable事務快照.
 * 我們不能信任與IsolationUsesXactSnapshot()結合使用的FirstSnapshotSet,
 *   因為GUC可能會在我們之前重置,改變IsolationUsesXactSnapshot的值。
 */
static Snapshot FirstXactSnapshot = NULL;
/*
 * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
 * mode, and to the latest one taken in a read-committed transaction.
 * SecondarySnapshot is a snapshot that's always up-to-date as of the current
 * instant, even in transaction-snapshot mode.  It should only be used for
 * special-purpose code (say, RI checking.)  CatalogSnapshot points to an
 * MVCC snapshot intended to be used for catalog scans; we must invalidate it
 * whenever a system catalog change occurs.
 * CurrentSnapshot指向在transaction-snapshot模式下獲取的唯一快照/在read-committed事務中獲取的最新快照。
 * SecondarySnapshot是即使在transaction-snapshot模式下,也總是最新的快照。它應該只用于特殊用途碼(例如,RI檢查)。
 * CatalogSnapshot指向打算用于catalog掃描的MVCC快照;
 *  無論何時發生system catalog更改,我們都必須馬上使其失效。
 *
 * These SnapshotData structs are static to simplify memory allocation
 * (see the hack in GetSnapshotData to avoid repeated malloc/free).
 * 這些SnapshotData結構體是靜態的便于簡化內存分配.
 * (可以回過頭來看GetSnapshotData函數如何避免重復的malloc/free)
 */
static SnapshotData CurrentSnapshotData = {HeapTupleSatisfiesMVCC};
static SnapshotData SecondarySnapshotData = {HeapTupleSatisfiesMVCC};
SnapshotData CatalogSnapshotData = {HeapTupleSatisfiesMVCC};
/* Pointers to valid snapshots */
//指向有效的快照
static Snapshot CurrentSnapshot = NULL;
static Snapshot SecondarySnapshot = NULL;
static Snapshot CatalogSnapshot = NULL;
static Snapshot HistoricSnapshot = NULL;
/*
 * These are updated by GetSnapshotData.  We initialize them this way
 * for the convenience of TransactionIdIsInProgress: even in bootstrap
 * mode, we don't want it to say that BootstrapTransactionId is in progress.
 * 這些變量通過函數GetSnapshotData更新.
 * 為了便于TransactionIdIsInProgress,以這種方式初始化它們:
 *   即使在引導模式下,我們也不希望表示BootstrapTransactionId正在進行中。
 *
 * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
 * InvalidTransactionId, to ensure that no one tries to use a stale
 * value. Readers should ensure that it has been set to something else
 * before using it.
 * RecentGlobalXmin和RecentGlobalDataXmin初始化為InvalidTransactionId,
 *   以確保沒有人嘗試使用過時的值。
 * 在使用它之前,讀取進程應確保它已經被設置為其他值。
 */
TransactionId TransactionXmin = FirstNormalTransactionId;
TransactionId RecentXmin = FirstNormalTransactionId;
TransactionId RecentGlobalXmin = InvalidTransactionId;
TransactionId RecentGlobalDataXmin = InvalidTransactionId;
/* (table, ctid) => (cmin, cmax) mapping during timetravel */
static HTAB *tuplecid_data = NULL;

MyPgXact
當前的事務信息.


/*
 * Flags for PGXACT->vacuumFlags
 * PGXACT->vacuumFlags標記
 *
 * Note: If you modify these flags, you need to modify PROCARRAY_XXX flags
 * in src/include/storage/procarray.h.
 * 注意:如果修改了這些標記,需要更新src/include/storage/procarray.h中的PROCARRAY_XXX標記
 *
 * PROC_RESERVED may later be assigned for use in vacuumFlags, but its value is
 * used for PROCARRAY_SLOTS_XMIN in procarray.h, so GetOldestXmin won't be able
 * to match and ignore processes with this flag set.
 * PROC_RESERVED可能在接下來分配給vacuumFlags使用,
 *   但是它在procarray.h中用于標識PROCARRAY_SLOTS_XMIN,
 *   因此GetOldestXmin不能匹配和忽略使用此標記的進程.
 */
//是否auto vacuum worker?
#define     PROC_IS_AUTOVACUUM  0x01    /* is it an autovac worker? */
//正在運行lazy vacuum
#define     PROC_IN_VACUUM      0x02    /* currently running lazy vacuum */
//正在運行analyze
#define     PROC_IN_ANALYZE     0x04    /* currently running analyze */
//只能通過auto vacuum設置
#define     PROC_VACUUM_FOR_WRAPAROUND  0x08    /* set by autovac only */
//在事務外部正在執行邏輯解碼
#define     PROC_IN_LOGICAL_DECODING    0x10    /* currently doing logical
                                                 * decoding outside xact */
//保留用于procarray
#define     PROC_RESERVED               0x20    /* reserved for procarray */
/* flags reset at EOXact */
//在EOXact時用于重置標記的MASK
#define     PROC_VACUUM_STATE_MASK \
    (PROC_IN_VACUUM | PROC_IN_ANALYZE | PROC_VACUUM_FOR_WRAPAROUND)
/*
 * Prior to PostgreSQL 9.2, the fields below were stored as part of the
 * PGPROC.  However, benchmarking revealed that packing these particular
 * members into a separate array as tightly as possible sped up GetSnapshotData
 * considerably on systems with many CPU cores, by reducing the number of
 * cache lines needing to be fetched.  Thus, think very carefully before adding
 * anything else here.
 */
typedef struct PGXACT
{
    //當前的頂層事務ID(非子事務)
    //出于優化的目的,只讀事務并不會分配事務號(xid = 0)
    TransactionId xid;          /* id of top-level transaction currently being
                                 * executed by this proc, if running and XID
                                 * is assigned; else InvalidTransactionId */
    //在啟動事務時,當前正在執行的最小事務號XID,但不包括LAZY VACUUM
    //vacuum不能清除刪除事務號xid >= xmin的元組
    TransactionId xmin;         /* minimal running XID as it was when we were
                                 * starting our xact, excluding LAZY VACUUM:
                                 * vacuum must not remove tuples deleted by
                                 * xid >= xmin ! */
    //vacuum相關的標記
    uint8       vacuumFlags;    /* vacuum-related flags, see above */
    bool        overflowed;
    bool        delayChkpt;     /* true if this proc delays checkpoint start;
                                 * previously called InCommit */
    uint8       nxids;
} PGXACT;
extern PGDLLIMPORT struct PGXACT *MyPgXact;

Snapshot
SnapshotData結構體指針,SnapshotData結構體可表達的信息囊括了所有可能的快照.
有以下幾種不同類型的快照:
1.常規的MVCC快照
2.在恢復期間的MVCC快照(處于Hot-Standby模式)
3.在邏輯解碼過程中使用的歷史MVCC快照
4.作為參數傳遞給HeapTupleSatisfiesDirty()函數的快照
5.作為參數傳遞給HeapTupleSatisfiesNonVacuumable()函數的快照
6.用于在沒有成員訪問情況下SatisfiesAny、Toast和Self的快照


//SnapshotData結構體指針
typedef struct SnapshotData *Snapshot;
//無效的快照
#define InvalidSnapshot     ((Snapshot) NULL)
/*
 * We use SnapshotData structures to represent both "regular" (MVCC)
 * snapshots and "special" snapshots that have non-MVCC semantics.
 * The specific semantics of a snapshot are encoded by the "satisfies"
 * function.
 * 我們使用SnapshotData結構體表示"regular" (MVCC) snapshots和具有非MVCC語義的"special" snapshots。
 */
//測試函數
typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
                                       Snapshot snapshot, Buffer buffer);
//常見的有:
//HeapTupleSatisfiesMVCC:判斷元組對某一快照版本是否有效
//HeapTupleSatisfiesUpdate:判斷元組是否可更新(同時更新同一個元組)
//HeapTupleSatisfiesDirty:判斷當前元組是否存在臟數據
//HeapTupleSatisfiesSelf:判斷tuple對自身信息是否有效
//HeapTupleSatisfiesToast:判斷是否TOAST表
//HeapTupleSatisfiesVacuum:判斷元組是否能被VACUUM刪除
//HeapTupleSatisfiesAny:所有元組都可見
//HeapTupleSatisfiesHistoricMVCC:用于CATALOG 表
/*
 * Struct representing all kind of possible snapshots.
 * 該結構體可表達的信息囊括了所有可能的快照.
 * 
 * There are several different kinds of snapshots:
 * * Normal MVCC snapshots
 * * MVCC snapshots taken during recovery (in Hot-Standby mode)
 * * Historic MVCC snapshots used during logical decoding
 * * snapshots passed to HeapTupleSatisfiesDirty()
 * * snapshots passed to HeapTupleSatisfiesNonVacuumable()
 * * snapshots used for SatisfiesAny, Toast, Self where no members are
 *   accessed.
 * 有以下幾種不同類型的快照:
 * * 常規的MVCC快照
 * * 在恢復期間的MVCC快照(處于Hot-Standby模式)
 * * 在邏輯解碼過程中使用的歷史MVCC快照
 * * 作為參數傳遞給HeapTupleSatisfiesDirty()函數的快照
 * * 作為參數傳遞給HeapTupleSatisfiesNonVacuumable()函數的快照
 * * 用于在沒有成員訪問情況下SatisfiesAny、Toast和Self的快照
 *
 * TODO: It's probably a good idea to split this struct using a NodeTag
 * similar to how parser and executor nodes are handled, with one type for
 * each different kind of snapshot to avoid overloading the meaning of
 * individual fields.
 * TODO: 使用類似于parser/executor nodes的處理,使用NodeTag來拆分結構體會是一個好的做法,
 *       使用OO(面向對象繼承)的方法.
 */
typedef struct SnapshotData
{
    //測試tuple是否可見的函數
    SnapshotSatisfiesFunc satisfies;    /* tuple test function */
    /*
     * The remaining fields are used only for MVCC snapshots, and are normally
     * just zeroes in special snapshots.  (But xmin and xmax are used
     * specially by HeapTupleSatisfiesDirty, and xmin is used specially by
     * HeapTupleSatisfiesNonVacuumable.)
     * 余下的字段僅用于MVCC快照,在特殊快照中通常為0。
     * (xmin和xmax可用于HeapTupleSatisfiesDirty,xmin可用于HeapTupleSatisfiesNonVacuumable)
     *
     * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
     * the effects of all older XIDs except those listed in the snapshot. xmin
     * is stored as an optimization to avoid needing to search the XID arrays
     * for most tuples.
     *  XIDs >= xmax的事務,對該快照是不可見的(沒有任何影響).
     * 對該快照可見的是小于xmax,但不在snapshot列表中的XIDs.
     * 記錄xmin是出于優化的目的,避免為大多數tuples搜索XID數組.
     */
    //XID ∈ [2,min)是可見的 
    TransactionId xmin;         /* all XID < xmin are visible to me */
    //XID ∈ [xmax,∞)是不可見的
    TransactionId xmax;         /* all XID >= xmax are invisible to me */
    /*
     * For normal MVCC snapshot this contains the all xact IDs that are in
     * progress, unless the snapshot was taken during recovery in which case
     * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
     * it contains *committed* transactions between xmin and xmax.
     * 對于普通的MVCC快照,xip存儲了所有正在進行中的XIDs,除非在恢復期間產生的快照(這時候數組為空)
     * 對于歷史MVCC快照,意義相反,即它包含xmin和xmax之間的*已提交*事務。
     *
     * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
     * 注意: 所有在xip數組中的XIDs滿足xmin <= xip[i] < xmax
     */
    TransactionId *xip;
    //xip數組中的元素個數
    uint32      xcnt;           /* # of xact ids in xip[] */
    /*
     * For non-historic MVCC snapshots, this contains subxact IDs that are in
     * progress (and other transactions that are in progress if taken during
     * recovery). For historic snapshot it contains *all* xids assigned to the
     * replayed transaction, including the toplevel xid.
     * 對于非歷史MVCC快照,下面這些域含有活動的subxact IDs.
     *   (以及在恢復過程中狀態為進行中的事務).
     * 對于歷史MVCC快照,這些域字段含有*所有*用于回放事務的快照,包括頂層事務XIDs.
     *
     * note: all ids in subxip[] are >= xmin, but we don't bother filtering
     * out any that are >= xmax
     * 注意:sbuxip數組中的元素均≥ xmin,但我們不需要過濾掉任何>= xmax的項
     */
    TransactionId *subxip;
    //subxip數組元素個數
    int32       subxcnt;        /* # of xact ids in subxip[] */
    //是否溢出?
    bool        suboverflowed;  /* has the subxip array overflowed? */
    //在Recovery期間的快照?
    bool        takenDuringRecovery;    /* recovery-shaped snapshot? */
    //如為靜態快照,則該值為F
    bool        copied;         /* false if it's a static snapshot */
    //在自身的事務中,CID < curcid是可見的
    CommandId   curcid;         /* in my xact, CID < curcid are visible */
    /*
     * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
     * snapshots.
     * HeapTupleSatisfiesDirty返回的值,在MVCC快照中無用
     */
    uint32      speculativeToken;
    /*
     * Book-keeping information, used by the snapshot manager
     * 用于快照管理器的Book-keeping信息
     */
    //在ActiveSnapshot棧中的引用計數
    uint32      active_count;   /* refcount on ActiveSnapshot stack */
    //在RegisteredSnapshots中的引用計數
    uint32      regd_count;     /* refcount on RegisteredSnapshots */
    //RegisteredSnapshots堆中的鏈接
    pairingheap_node ph_node;   /* link in the RegisteredSnapshots heap */
    //快照"拍攝"時間戳
    TimestampTz whenTaken;      /* timestamp when snapshot was taken */
    //拍照時WAL stream中的位置
    XLogRecPtr  lsn;            /* position in the WAL stream when taken */
} SnapshotData;

二、源碼解讀

GetTransactionSnapshot函數在事務處理中為新查詢獲得相應的快照.


/*
 * GetTransactionSnapshot
 *      Get the appropriate snapshot for a new query in a transaction.
 *      在事務處理中為新查詢獲得相應的快照
 *
 * Note that the return value may point at static storage that will be modified
 * by future calls and by CommandCounterIncrement().  Callers should call
 * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
 * used very long.
 * 注意返回值可能會指向將來調用和CommandCounterIncrement()函數修改的靜態存儲區.
 * 如需要長時間保持快照,調用者需要調用RegisterSnapshot或者PushActiveSnapshot函數記錄快照信息.
 */
Snapshot
GetTransactionSnapshot(void)
{
    /*
     * Return historic snapshot if doing logical decoding. We'll never need a
     * non-historic transaction snapshot in this (sub-)transaction, so there's
     * no need to be careful to set one up for later calls to
     * GetTransactionSnapshot().
     * 如執行邏輯解碼,則返回歷史快照.
     * 在該事務中,我們不需要非歷史快照,因此不需要為后續的GetTransactionSnapshot()調用小心配置
     */
    if (HistoricSnapshotActive())
    {
        Assert(!FirstSnapshotSet);
        return HistoricSnapshot;
    }
    /* First call in transaction? */
    //首次調用?
    if (!FirstSnapshotSet)
    {
        /*
         * Don't allow catalog snapshot to be older than xact snapshot.  Must
         * do this first to allow the empty-heap Assert to succeed.
         * 不允許catalog快照比事務快照更舊.
         * 必須首次執行該函數以確保empty-heap驗證是成功的.
         */
        InvalidateCatalogSnapshot();
        Assert(pairingheap_is_empty(&RegisteredSnapshots));
        Assert(FirstXactSnapshot == NULL);
        if (IsInParallelMode())
            elog(ERROR,
                 "cannot take query snapshot during a parallel operation");
        /*
         * In transaction-snapshot mode, the first snapshot must live until
         * end of xact regardless of what the caller does with it, so we must
         * make a copy of it rather than returning CurrentSnapshotData
         * directly.  Furthermore, if we're running in serializable mode,
         * predicate.c needs to wrap the snapshot fetch in its own processing.
         * 在transaction-snapshot模式下,無論調用者對它做什么,第一個快照必須一直存在到xact事務結束,
         * 因此我們必須復制它,而不是直接返回CurrentSnapshotData。
         */
        if (IsolationUsesXactSnapshot())
        {
            //transaction-snapshot模式
            /* First, create the snapshot in CurrentSnapshotData */
            //首先,在CurrentSnapshotData中創建快照
            if (IsolationIsSerializable())
                //隔離級別 = Serializable
                CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
            else
                //其他隔離級別
                CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
            /* Make a saved copy */
            //拷貝快照
            CurrentSnapshot = CopySnapshot(CurrentSnapshot);
            FirstXactSnapshot = CurrentSnapshot;
            /* Mark it as "registered" in FirstXactSnapshot */
            //在FirstXactSnapshot中標記該快照已注冊
            FirstXactSnapshot->regd_count++;
            pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
        }
        else
            //非transaction-snapshot模式,直接獲取
            CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
        //設置標記
        FirstSnapshotSet = true;
        return CurrentSnapshot;
    }
    //transaction-snapshot模式
    if (IsolationUsesXactSnapshot())
        return CurrentSnapshot;
    /* Don't allow catalog snapshot to be older than xact snapshot. */
    //不允許catalog快照比事務快照舊
    InvalidateCatalogSnapshot();
    //獲取快照
    CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
    //返回
    return CurrentSnapshot;
}

三、跟蹤分析

執行簡單查詢,可觸發獲取快照邏輯.


16:35:08 (xdb@[local]:5432)testdb=# begin;
BEGIN
16:35:13 (xdb@[local]:5432)testdb=#* select 1;

啟動gdb,設置斷點


(gdb) b GetTransactionSnapshot
Breakpoint 1 at 0xa9492e: file snapmgr.c, line 312.
(gdb) c
Continuing.
Breakpoint 1, GetTransactionSnapshot () at snapmgr.c:312
312     if (HistoricSnapshotActive())
(gdb)

如執行邏輯解碼,則返回歷史快照(本例不是).


(gdb) n
319     if (!FirstSnapshotSet)
(gdb)

首次調用?是,進入相應的邏輯


319     if (!FirstSnapshotSet)
(gdb) n
325         InvalidateCatalogSnapshot();
(gdb) 
327         Assert(pairingheap_is_empty(&RegisteredSnapshots));
(gdb) 
328         Assert(FirstXactSnapshot == NULL);
(gdb) n
330         if (IsInParallelMode())
(gdb)

非transaction-snapshot模式,直接調用GetSnapshotData獲取


(gdb) 
341         if (IsolationUsesXactSnapshot())
(gdb) 
356             CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
(gdb) p CurrentSnapshotData
$1 = {satisfies = 0xa9310d <HeapTupleSatisfiesMVCC>, xmin = 2342, xmax = 2350, xip = 0x14bee40, xcnt = 2, 
  subxip = 0x1514fa0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = false, curcid = 0, 
  speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, 
    prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}
(gdb)

函數執行成功,查看CurrentSnapshot
注:2342事務所在的進程已被kill


(gdb) n
358
(gdb) p CurrentSnapshot
$2 = (Snapshot) 0xf9be60 <CurrentSnapshotData>
(gdb) p *CurrentSnapshot
$3 = {satisfies = 0xa9310d <HeapTupleSatisfiesMVCC>, xmin = 2350, xmax = 2350, xip = 0x14bee40, xcnt = 0, 
  subxip = 0x1514fa0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = false, curcid = 0, 
  speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, 
    prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}
(gdb)

執行成功


(gdb) n
359         return CurrentSnapshot;
(gdb) 
371 }
(gdb) 
exec_simple_query (query_string=0x149aec8 "select 1;") at postgres.c:1059
1059                snapshot_set = true;
(gdb)

查看全局變量MyPgXact


(gdb) p MyPgXact
$7 = (struct PGXACT *) 0x7f47103c01f4
(gdb) p *MyPgXact
$8 = {xid = 0, xmin = 2350, vacuumFlags = 0 '\000', overflowed = false, delayChkpt = false, nxids = 0 '\000'}
(gdb)

注意:
1.xid = 0,表示未分配事務號.出于優化的理由,PG在修改數據時才會分配事務號.
2.txid_current()函數會分配事務號;txid_current_if_assigned()函數不會.

DONE!

遺留問題:
1.CurrentSnapshotData全局變量中的信息何時初始化/更改?
2.GetSnapshotData函數的實現(下節介紹).

四、參考資料

PG Source Code

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女