溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

PHP7中字符串處理邏輯的優化方法

發布時間：2020-11-02 11:05:17 來源：億速云閱讀：229 作者：小新欄目：編程語言

這篇文章主要介紹了PHP7中字符串處理邏輯的優化方法，具有一定借鑒價值，需要的朋友可以參考下。希望大家閱讀完這篇文章后大有收獲。下面讓小編帶著大家一起了解一下。

一、通過代碼對比 PHP 5 和 PHP 7 中字符串處理的區別

?? 先看如下示例代碼：

$a = 'foo';
$b = 'bar';

$c = "I like $a and $b";

?? 在 PHP 5.6 中執行代碼，得到的 opcode 輸出如下圖： PHP7中字符串處理邏輯的優化方法

?? 具體的執行過程：

首先為 I like 申請內存
然后為 I like foo 申請內存，然后將 I like 和 foo 復制到新申請的內存空間并返回
繼續為 I like foo and 申請內存，然后將 I like foo 和 and 復制到新申請的內存空間并返回
最后為 I like foo and bar 申請內存，然后將 I like foo and 和 bar 復制到新申請的內存并返回
釋放字符串處理過程中申請的內存空間

?? 在 PHP 7 中執行代碼，得到的 opcode 輸出如下入： PHP7中字符串處理邏輯的優化方法

?? PHP 7 中對字符串的處理過程相對簡單，首先創建一個堆棧 stack，然后將要連接的字符串片段存入?？臻g，最后只需要分配一次內存空間，然后將結果從棧中移動到所分配的內存空間中。相較于 PHP 5，PHP 7 在處理過程中避免了反復申請內存的過程。而 PHP 7 在字符串處理方面性能的提升得益于數據結構 rope 的使用。

二、 Rope 數據結構介紹

⒈ Rope 介紹

?? Rope 是一棵二叉樹，其中每個葉子節點存儲的是字符串的子串，每個非葉子節點存儲的是位于該節點左側的每個葉子節點所存儲的子串中所包含的字符總數（方便了根據索引查找指定字符）。 PHP7中字符串處理邏輯的優化方法

⒉ Rope 的優勢和不足

?? 優勢：

Rope 使得字符串的插入、刪除等操作更快速，相較于傳統額字符數組的方式，Rope 的時間復雜度為 $O (lo g N)$ ，字符數組的時間復雜度為 $O (N)$
以字符數組的方式對字符串進行操作時，首先需要進行數組的拷貝，這樣就需要占用額外的內存空間 $O (N)$ ，而 Rope 不需要事先進行拷貝
Rope 不需要像字符數組那樣需要大塊連續的內存
使用 Rope 數據結構，字符串操作的撤銷會非常方便
使用 Rope，無論所操作的字符串的 size 多大，性能都十分穩定

?? 不足：

需要額外的內存空間存儲非葉子節點，相較于字符數組增加了內存空間的消耗
由于二叉樹的結構相較于字符數組比較復雜，這樣在操作 size 較小的字符串時性能反而比較差
使用 Rope 操作字符串，增加了代碼的復雜度，同時也增加了出現 BUG 的風險

⒊ Rope 的代碼實現

?? Rope 的本質是一棵二叉樹，其基本的插入、刪除、搜索操作與二叉樹相同。這里只對字符串的聯接（concat）和拆分（split）進行介紹。

另外，為了盡量減少操作的時間復雜度，可以將 Rope 構造成一棵 AVL 樹，這樣在每次操作完成后實現自平衡。但這樣可能會增加一些撤銷操作的復雜度。例如，將兩棵高度不等的 Rope 進行 concat 操作后形成的新的 Rope，通過節點旋轉實現自平衡后可能會破壞原先兩棵樹的結構，這樣如果要撤銷之前的 concat 操作會變得非常復雜。

? concat

?? concat 操作相對簡單，只需要將兩棵 Rope 樹聯接成一棵新的 Rope 樹即可

? split

?? 對 Rope 樹進行拆分的時候，會遇到兩種情況：拆分的位置正好位于某個節點的末尾或者拆分的位置在某個葉子節點的中間。對于第二種情況，我們可以把相應的葉子節點進行拆分，從而轉化成第一種情況。而對于第一種情況，根據節點所處的位置又可以分為兩種情況：

PHP7中字符串處理邏輯的優化方法

PHP7中字符串處理邏輯的優化方法

class Node:
    def __init__(self, data):
        self.parent = None
        self.left = None
        self.right = None
        self.data = data
        self.weight = 0

    def __repr__(self):
        return str(self.__dict__)

    def __str__(self):
        return str(self.__dict__)


class Rope:
    # 每個葉子節點最多存儲 5 個字符，包括空字符在內
    LEAF_DATA_LEN = 5

    def __init__(self):
        self.root = None

    def create_rope(self, parent, data, left_index, right_index):
        """
        創建 rope 數據結構
        :param parent: 父節點（根節點的父節點為空）
        :param data: 用于創建 rope 數據結構的原始數據，這里只接受 list 和 str 兩種類型
        :param left_index: 起始位置索引
        :param right_index: 結束位置索引
        :return: Node
        """
        if isinstance(data, str):
            data = list(data)
        elif not isinstance(data, list):
            return

        if right_index - left_index > self.LEAF_DATA_LEN:
            node = Node("")

            node.parent = parent
            middle_index = (left_index + right_index) // 2
            node.weight = middle_index - left_index
            node.left = self.create_rope(node, data, left_index, middle_index)
            node.right = self.create_rope(node, data, middle_index, right_index)
        else:
            node = Node(data[left_index: right_index])

            node.parent = parent
            node.weight = right_index - left_index

        if node.parent is None:
            self.root = node

        return node

    @staticmethod
    def calc_weight(node):
        """
        計算節點 weight 值
        :param node:
        :return:
        """
        if node is None:
            return 0

        init_weight = node.weight
        while node.right is not None:
            node = node.right
            init_weight += node.weight

        return init_weight

    def concat_rope(self, data1, data2):
        """
        字符串連接
        :param data1:
        :param data2:
        :return:
        """
        r1 = Rope()
        r1.create_rope(None, data1, 0, len(data1))
        r2 = Rope()
        r2.create_rope(None, data2, 0, len(data2))

        node = Node("")
        node.left = r1.root
        node.right = r2.root
        r1.root.parent = node
        r2.root.parent = node

        node.weight = self.calc_weight(r1)

        self.root = node

    def split_rope(self, data, index):
        """
        字符串拆分
        :param data: 要拆分的字符串
        :param index: 拆分的位置（字符串索引從 0 開始計算）
        :return: Rope
        """
        if index < 0 or index > len(data) - 1:
            return

        node = self.create_rope(None, data, 0, len(data))
        original_index = index

        if index == self.root.weight - 1:
            # 在根節點拆分
            rope_left = node.left
            rope_left.parent = None
            rope_right = node.right
            rope_right.parent = None
            return rope_left, rope_right
        elif index < self.root.weight - 1:
            while index < node.weight - 1 and node.data == "":
                node = node.left
        else:
            while index > node.weight - 1 and node.data == "":
                index -= node.weight
                node = node.right

        if node.data != "":
            # index 落在了最左側和最右側的兩個葉子節點
            if original_index < self.root.weight - 1:
                # index 落在了最左側的葉子節點
                rope_left = self.create_rope(None, node.data[0:index + 1], 0, index + 1)
                rope_right = self.root
                # 更新 rope_right 的 weight
                node.data = node.data[index + 1:]
                while node is not None:
                    node.weight -= (index + 1)
                    node = node.parent
            else:
                # index 落在了最右側的葉子節點
                rope_left = self.root
                rope_right = self.create_rope(None, node.data[index + 1:], 0, len(node.data[index + 1:]))
                node.data = node.data[0:index + 1]
        elif index == node.weight - 1:
            # index 正好落在了節點的末尾
            if original_index < self.root.weight:
                # index 落在了最左側分支中的非葉子節點的末尾
                weight_sub = node.weight
                rope_left = node.left
                rope_left.parent = None
                node.left = None
                rope_right = self.root
                # 更新節點 weight
                while node is not None:
                    node.weight -= weight_sub
                    node = node.parent
            else:
                # index 落在了最右側分支中的非葉子節點的末尾
                rope_left = self.root
                rope_right = node.right
                rope_right.parent = None
                node.right = None
        else:
            stack = []
            if original_index < self.root.weight:
                # index 落在了左子樹中的節點
                index -= node.weight
                rope_left = node
                rope_right = self.root
                node.parent.left = None
                node.parent = None
                node = node.right
            else:
                # index 落在了右子樹中的節點
                rope_left = self.root
                stack.append(node.right)
                rope_right = None
                node.right = None
                node = node.left
            while node.data == "" and index >= 0:
                if index < node.weight - 1:
                    stack.append(node.right)
                    node.right = None
                    node = node.left
                elif index > node.weight - 1:
                    node = node.right
                    index -= node.weight
                else:
                    stack.append(node.right)
                    node.right = None
                    break
            if node.data != "":
                # 需要拆分葉子節點
                new_node = Node(node.data[index + 1:])
                new_node.weight = node.weight - index - 1
                stack.append(new_node)
                node.data = node.data[0:index + 1]
            # 更新節點的 weight 信息
            while node is not None:
                if node.data != "":
                    node.weight = len(node.data)
                else:
                    node.weight = self.calc_weight(node.left)
                node = node.parent
            # 組裝 rope_right并更新節點的 weight 值
            left_node = None
            while len(stack) > 0:
                root_node = Node("")
                if left_node is None:
                    left_node = stack.pop()
                    root_node = left_node
                else:
                    root_node.left = left_node
                    left_node.parent = root_node
                    right_node = stack.pop()
                    root_node.right = right_node
                    right_node.parent = root_node
                    root_node.weight = self.calc_weight(root_node.left)
                    left_node = root_node

            if rope_right is None:
                # index > self.root.weight - 1
                rope_right = root_node
            else:
                # index < self.root.weight - 1
                tmp = rope_right
                while tmp.left is not None:
                    tmp = tmp.left
                tmp.left = root_node
                root_node.parent = tmp
                while tmp.parent is not None:
                    tmp.weight = self.calc_weight(tmp.left)
                    tmp = tmp.parent
                rope_right = tmp
                rope_right.weight = self.calc_weight(rope_right.left)

        return rope_left, rope_right


rope = Rope()
data = "php is a script language"
index = 18
left, right = rope.split_rope(data, index)
print(left)
print(right)

三、 PHP 5 和 PHP 7 底層字符串處理邏輯比較

⒈ PHP 5 中字符串處理邏輯以及存在的問題

? 處理邏輯

PHP 5 中的字符串并沒有固定的數據結構，而是采用 C 中以 NULL 結尾的字符數組的方式來處理
為了支持二進制字符串（字符串中包含 NULL 字符），還需要額外記錄字符串的長度

? 存在的問題

?? PHP 5 中 zval 中定義的字符串的長度為 int（有符號整型）類型，這就導致即使是在 64 為機器上字符串的長度也不能超過 $2^{31}$ （LP64 數據模型中，int 永遠只有 32 位）。

// zval 中對字符串的定義，長度為 int 類型
typedef union _zvalue_value {
		long lval;
		double dval;
		struct {
			char *val;     /* C string buffer, NULL terminated */
			int len;      /* String length : num of ASCII chars */
		} str;            /* string structure */
		HashTable *ht;
		zend_object_value obj;
		zend_ast *ast;
} zvalue_value;
	
// 類實例結構體中對字符串的定義，此時字符串的長度為 zend_uint 類型，相較于 int 類型，字符串長度提升了一倍
struct _zend_class_entry {
	char type;
	const char *name;		/* C string buffer, NULL terminated */
	zend_uint name_length;		/* String length : num of ASCII chars */
	struct _zend_class_entry *parent;
	int refcount;
	zend_uint ce_flags;

	/*……*/
}	

// hashtable 的 key 中對字符串的定義，長度為 uint 類型
typedef struct _zend_hash_key {
		const char *arKey;  /* C string buffer, NULL terminated */
		uint nKeyLength;    /* String length : num of ASCII chars */
		ulong h;
} zend_hash_key;

?? PHP 5 中很多地方都支持二進制字符串，由于字符串沒有一個固定的結構體，這就導致很多地方都有對字符串的定義。同時，由于各處對字符串長度的定義不同，導致各處支持的字符串長度也不同。

?? 在較老版本的 PHP 中，由于沒有引入 interned string，同一個字符串如果在多處被使用，為了互不影響，就需要復制多份出來，這就造成了對內存空間大量消耗。

static PHP_FUNCTION(session_id)
	{
		char *name = NULL;
		int name_len, argc = ZEND_NUM_ARGS();

		if (zend_parse_parameters(argc TSRMLS_CC, "|s", &name, &name_len) == FAILURE) {
		    return;
		}
	
		/* …… */

		if (name) {
		    if (PS(id)) {
		        efree(PS(id));
		    }
		    PS(id) = estrndup(name, name_len);
		}
	}

?? 以 PHP 函數 session_id 為例，該函數最終將 name 完全復制一份然后存入 PS(id)。這樣，后續代碼對 name 進行的任何操作都不會影響 PS(id) 中存儲的值。

? interned string

?? 從 PHP 5.4 起，為了解決上述字符串復制導致內存消耗大的問題，PHP 引入了 interned string 。另外，由于 interned string 需要共享內存空間，所以在線程安全的 PHP 版本中并不被支持。

?? 所謂 interned string，即一個字符串在一個進程中只存儲一次。當一個 php-fpm 進程啟動時，會申請一塊 size 為 1MB 的 buffer，持久化的字符串（字符串常量、函數名、變量名、類名、方法名……）會被存儲到該緩沖并添加到一個 hashtable 中。由于 PHP 處理的 request 之間是相互獨立的，所以一個 request 處理完成后，其間在內存中產生的數據都會被銷毀。為了在處理 request 的過程中也能使用 interned string，在 request 到來時 PHP 會記錄當前 interned string buffer 的 top 位置，在該 request 請求的處理完成后，top 位置之后消耗的空間又會被恢復。

PHP7中字符串處理邏輯的優化方法

interned string 的初始化，只會發生在 PHP 啟動以及 PHP 腳本的編譯階段
interned string 是只讀的，既不能修改，也不能銷毀
由于 interned string buffer 只有 1MB 的空間，當 1MB 的空間存滿后，后續的字符串處理還會回到引入 interned string 之前的狀態

?? interned string 的優勢

一個字符串只會存儲一次，節省了內存空間
interned string 的 hash 只會計算一次，然后被多次使用
interned string 的比較最終轉換成了指針地址的比較

?? 由于需要操作 hashtable，interned string 的初始化和創建會比較復雜，也正因為如此，并不是所有的字符串都適合存入 interned string。

⒉ PHP 7 中對字符串處理進行的優化

struct _zend_string {
	zend_refcounted_h gc;
	zend_ulong        h;                /* hash value */
	size_t            len;
	char              val[1];
};

typedef struct _zend_refcounted_h {
	uint32_t         refcount;			/* reference counter 32-bit */
	union {
		uint32_t type_info;
	} u;
} zend_refcounted_h;

?? PHP 7 中字符串有了固定的結構 zend_string

?? PHP 7 中字符串的長度類型為 size_t，字符串的長度不再局限于 PHP 5 中的 $2^{31}$ ，尤其在 64 位的機器上，字符串的長度取決于平臺支持的最大長度。

??PHP 7 中字符串內容的存儲不再像之前使用 char * ，而是使用了 struct hack ，使得字符串內容和 zend_string 一起存儲在一塊連續的內存空間。同時，zend_string 還內嵌了字符串的 hash 值，這樣，對于一個指定的字符串只需要進行一次 hash 計算。

??PHP 7 中字符串引入了引用計數，這樣，只要字符串還在被使用，就不用擔心會被銷毀。

static PHP_FUNCTION(session_id)
{
	zend_string *name = NULL;
	int argc = ZEND_NUM_ARGS();

	/* …… */

	if (name) {
	    if (PS(id)) {
	        zend_string_release(PS(id));
	    }
	    PS(id) = zend_string_copy(name);
	}
}

static zend_always_inline zend_string *zend_string_copy(zend_string *s)
{
	if (!ZSTR_IS_INTERNED(s)) {
	    GC_REFCOUNT(s)++;
	}
	return s;
}

static zend_always_inline void zend_string_release(zend_string *s)
{
	if (!ZSTR_IS_INTERNED(s)) {
	    if (--GC_REFCOUNT(s) == 0) {
	        pefree(s, GC_FLAGS(s) & IS_STR_PERSISTENT);
	    }
	}
}

??仍以函數 session_id 為例，設置 session_id 時不再需要將 name 完全復制，而只是將 name 的引用計數加 1。在刪除字符串時也是同樣，將字符串的引用計數減 1，只有引用計數為 0 時才會真正銷毀字符串。

??PHP 7 中的 interned string 不再需要單獨申請 interned string buffer 來存儲，而是在 zend_string 的 gc 中將 type_info 標記為 IS_STR_INTERNED。這樣，在銷毀字符串時，zend_string_release 會檢查字符串是否為 interned string，如果是則不會進行任何操作。

感謝你能夠認真閱讀完這篇文章，希望小編分享PHP7中字符串處理邏輯的優化方法內容對大家有幫助，同時也希望大家多多支持億速云，關注億速云行業資訊頻道，遇到問題就找億速云，詳細的解決方法等著你來學習!

向AI問一下細節

推薦閱讀：

免責聲明：本站發布的內容（圖片、視頻和文字）以原創、轉載和分享為主，文章觀點不代表本網站立場，如果涉及侵權請聯系站長郵箱：is@yisu.com進行舉報，并提供相關證據，一經查實，將立刻刪除涉嫌侵權內容。

上一篇新聞：
java枚舉類型enum的使用方法
下一篇新聞：
if語句的使用案例

猜你喜歡

AI
助
手

產品服務

地區劃分

專題活動

幫助支持

關于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關注億速云

億速云公眾號

手機網站二維碼

亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女