来源 | OSCHINA 社区
作者 | 京东云开发者-京东物流 纪卓志
原文链接:https://my.oschina.net/u/4090830/blog/6241452
目前市面上充斥着大量关于跳跃表结构与 Redis 的源码解析,但是经过长期观察后发现大都只是在停留在代码的表面,而没有系统性地介绍跳跃表的由来以及各种常量的由来。作为一种概率数据结构,理解各种常量的由来可以更好地进行变化并应用到高性能功能开发中。本文没有重复地以对现有优秀实现进行代码分析,而是通过对跳跃表进行了系统性地介绍与形式化分析,并给出了在特定场景下的跳跃表扩展方式,方便读者更好地理解跳跃表数据结构。
跳跃表 [1,2,3] 是一种用于在大多数应用程序中取代平衡树的概率数据结构。跳跃表拥有与平衡树相同的期望时间上界,并且更简单、更快、是用更少的空间。在查找与列表的线性操作上,比平衡树更快,并且更简单。
概率平衡也可以被用在基于树的数据结构 [4] 上,例如树堆(Treap)。与平衡二叉树相同,跳跃表也实现了以下两种操作
这几种操作在平衡树中也可以实现,但是在跳跃表中实现起来更简单而且非常的快,并且通常情况下很难在平衡树中直接实现(树的线索化可以实现与链表相同的效果,但是这使得实现变得更加复杂 [6])
最简单的支持查找的数据结构可能就是链表。Figure.1 是一个简单的链表。在链表中执行一次查找的时间正比于必须考查的节点个数,这个个数最多是 N。
Figure.1 Linked List
Figure.2 表示一个链表,在该链表中,每个一个节点就有一个附加的指针指向它在表中的前两个位置上的节点。正因为这个前向指针,在最坏情况下最多考查⌈N/2⌉+1 个节点。
Figure.2 Linked List with fingers to the 2nd forward elements
Figure.3 将这种想法扩展,每个序数是 4 的倍数的节点都有一个指针指向下一个序数为 4 的倍数的节点。只有⌈N/4⌉+2 个节点被考查。
Figure.3 Linked List with fingers to the 4th forward elements
这种跳跃幅度的一般情况如 Figure.4 所示。每个 2i 节点就有一个指针指向下一个 2i 节点,前向指针的间隔最大为 N/2。可以证明总的指针最大不会超过 2N(见空间复杂度分析),但现在在一次查找中最多考查⌈logN⌉个节点。这意味着一次查找中总的时间消耗为 O (logN),也就是说在这种数据结构中的查找基本等同于二分查找(binary search)。
Figure.4 Linked List with fingers to the 2ith forward elements
在这种数据结构中,每个元素都由一个节点表示。每个节点都有一个高度(height)或级别(level),表示节点所拥有的前向指针数量。每个节点的第 i 个前向指针指向下一个级别为 i 或更高的节点。
在前面描述的数据结构中,每个节点的级别都是与元素数量有关的,当插入或删除时需要对数据结构进行调整来满足这样的约束,这是很呆板且低效的。为此,可以_将每个 2i 节点就有一个指针指向下一个 2i 节点_的限制去掉,当新元素插入时为每个新节点分配一个随机的级别而不用考虑数据结构的元素数量。
Figure.5 Skip List
到此为止,已经得到了所有让链表支持快速查找的充要条件,而这种形式的数据结构就是跳跃表。接下来将会使用更正规的方式来定义跳跃表
稍后将会提到,节点的查找过程是在头节点从最高级别的指针开始,沿着这个级别一直走,直到找到大于正在寻找的节点的下一个节点(或者是 NULL),在此过程中除了头节点外并没有使用到每个节点的级别,因此每个节点无需存储节点的级别。
在跳跃表中,级别为 1 的前向指针与原始的链表结构中 next 指针的作用完全相同,因此跳跃表支持所有链表支持的算法。
对应到高级语言中的结构定义如下所示(后续所有代码示例都将使用 C 语言描述)
#define SKIP_LIST_KEY_TYPE int
#define SKIP_LIST_VALUE_TYPE int
#define SKIP_LIST_MAX_LEVEL 32
#define SKIP_LIST_P 0.5
struct Node {
SKIP_LIST_KEY_TYPE key;
SKIP_LIST_VALUE_TYPE value;
struct Node *forwards[]; // flexible array member
};
struct SkipList {
struct Node *head;
int level;
};
struct Node *CreateNode(int level) {
struct Node *node;
assert(level > 0);
node = malloc(sizeof(struct Node) + sizeof(struct Node *) * level);
return node;
}
struct SkipList *CreateSkipList() {
struct SkipList *list;
struct Node *head;
int i;
list = malloc(sizeof(struct SkipList));
head = CreateNode(SKIP_LIST_MAX_LEVEL);
for (i = 0; i < SKIP_LIST_MAX_LEVEL; i++) {
head->forwards[i] = NULL;
}
list->head = head;
list->level = 1;
return list;
}
struct Node *SkipListSearch(struct SkipList *list, SKIP_LIST_KEY_TYPE target) {
struct Node *current;
int i;
current = list->head;
for (i = list->level - 1; i >= 0; i--) {
while (current->forwards[i] && current->forwards[i]->key < target) {
current = current->forwards[i];
}
}
current = current->forwards[0];
if (current->key == target) {
return current;
} else {
return NULL;
}
}
struct Node *SkipListInsert(struct SkipList *list, SKIP_LIST_KEY_TYPE key, SKIP_LIST_VALUE_TYPE value) {
struct Node *update[SKIP_LIST_MAX_LEVEL];
struct Node *current;
int i;
int level;
current = list->head;
for (i = list->level - 1; i >= 0; i--) {
while (current->forwards[i] && current->forwards[i]->key < target) {
current = current->forwards[i];
}
update[i] = current;
}
current = current->forwards[0];
if (current->key == target) {
current->value = value;
return current;
}
level = SkipListRandomLevel();
if (level > list->level) {
for (i = list->level; i < level; i++) {
update[i] = list->header;
}
}
current = CreateNode(level);
current->key = key;
current->value = value;
for (i = 0; i < level; i++) {
current->forwards[i] = update[i]->forwards[i];
update[i]->forwards[i] = current;
}
return current;
}
struct Node *SkipListDelete(struct SkipList *list, SKIP_LIST_KEY_TYPE key) {
struct Node *update[SKIP_LIST_MAX_LEVEL];
struct Node *current;
int i;
current = list->head;
for (i = list->level - 1; i >= 0; i--) {
while (current->forwards[i] && current->forwards[i]->key < key) {
current = current->forwards[i];
}
update[i] = current;
}
current = current->forwards[0];
if (current && current->key == key) {
for (i = 0; i < list->level; i++) {
if (update[i]->forwards[i] == current) {
update[i]->forwards[i] = current->forwards[i];
} else {
break;
}
}
while (list->level > 1 && list->head->forwards[list->level - 1] == NULL) {
list->level--;
}
}
return current;
}
int SkipListRandomLevel() {
int level;
level = 1;
while (random() < RAND_MAX * SKIP_LIST_P && level <= SKIP_LIST_MAX_LEVEL) {
level++;
}
return level;
}
以下表格为按上面算法执行 232 次后,所生成的随机级别的分布情况。
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|
| 2147540777 | 1073690199 | 536842769 | 268443025 | 134218607 | 67116853 | 33563644 | 16774262 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 8387857 | 4193114 | 2098160 | 1049903 | 523316 | 262056 | 131455 | 65943 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 32611 | 16396 | 8227 | 4053 | 2046 | 1036 | 492 | 249 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 |
| 121 | 55 | 34 | 16 | 7 | 9 | 2 | 1 |
#define SKIP_LIST_KEY_TYPE int
#define SKIP_LIST_VALUE_TYPE int
#define SKIP_LIST_MAX_LEVEL 32
#define SKIP_LIST_P 0.5
struct Node; // forward definition
struct Forward {
struct Node *forward;
int span;
}
struct Node {
SKIP_LIST_KEY_TYPE key;
SKIP_LIST_VALUE_TYPE value;
struct Forward forwards[]; // flexible array member
};
struct SkipList {
struct Node *head;
int level;
int length;
};
struct Node *CreateNode(int level) {
struct Node *node;
assert(level > 0);
node = malloc(sizeof(struct Node) + sizeof(struct Forward) * level);
return node;
}
struct SkipList *CreateSkipList() {
struct SkipList *list;
struct Node *head;
int i;
list = malloc(sizeof(struct SkipList));
head = CreateNode(SKIP_LIST_MAX_LEVEL);
for (i = 0; i < SKIP_LIST_MAX_LEVEL; i++) {
head->forwards[i].forward = NULL;
head->forwards[i].span = 0;
}
list->head = head;
list->level = 1;
return list;
}
接下来需要修改插入和删除操作,来保证在跳跃表修改后跨度的数据完整性。
需要注意的是,在插入过程中需要使用 indices 记录在每个层级遍历到的最后一个元素的位置,这样通过做简单的减法操作就可以知道每个层级遍历到的最后一个元素到新插入节点的跨度。
struct Node *SkipListInsert(struct SkipList *list, SKIP_LIST_KEY_TYPE key, SKIP_LIST_VALUE_TYPE value) {
struct Node *update[SKIP_LIST_MAX_LEVEL];
struct Node *current;
int indices[SKIP_LIST_MAX_LEVEL];
int i;
int level;
current = list->head;
for (i = list->level - 1; i >= 0; i--) {
if (i == list->level - 1) {
indices[i] = 0;
} else {
indices[i] = indices[i + 1];
}
while (current->forwards[i].forward && current->forwards[i].forward->key < target) {
indices[i] += current->forwards[i].span;
current = current->forwards[i].forward;
}
update[i] = current;
}
current = current->forwards[0].forward;
if (current->key == target) {
current->value = value;
return current;
}
level = SkipListRandomLevel();
if (level > list->level) {
for (i = list->level; i < level; i++) {
indices[i] = 0;
update[i] = list->header;
update[i]->forwards[i].span = list->length;
}
}
current = CreateNode(level);
current->key = key;
current->value = value;
for (i = 0; i < level; i++) {
current->forwards[i].forward = update[i]->forwards[i].forward;
update[i]->forwards[i].forward = current;
current->forwards[i].span = update[i]->forwards[i].span - (indices[0] - indices[i]);
update[i]->forwards[i].span = (indices[0] - indices[i]) + 1;
}
list.length++;
return current;
}
struct Node *SkipListDelete(struct SkipList *list, SKIP_LIST_KEY_TYPE key) {
struct Node *update[SKIP_LIST_MAX_LEVEL];
struct Node *current;
int i;
current = list->head;
for (i = list->level - 1; i >= 0; i--) {
while (current->forwards[i].forward && current->forwards[i].forward->key < key) {
current = current->forwards[i].forward;
}
update[i] = current;
}
current = current->forwards[0].forward;
if (current && current->key == key) {
for (i = 0; i < list->level; i++) {
if (update[i]->forwards[i].forward == current) {
update[i]->forwards[i].forward = current->forwards[i];
update[i]->forwards[i].span += current->forwards[i].span - 1;
} else {
break;
}
}
while (list->level > 1 && list->head->forwards[list->level - 1] == NULL) {
list->level--;
}
}
return current;
}
Pugh, W. (1989). A skip list cookbook. Tech. Rep. CS-TR-2286.1, Dept. of Computer Science, Univ. of Maryland, College Park, MD [July 1989]
Pugh, W. (1989). Skip lists: A probabilistic alternative to balanced trees. Lecture Notes in Computer Science, 437–449. https://doi.org/10.1007/3-540-51542-9_36
Weiss, M. A. (1996).Data Structures and Algorithm Analysis in C (2nd Edition)(2nd ed.). Pearson.
Aragon, Cecilia & Seidel, Raimund. (1989). Randomized Search Trees. 540-545. 10.1109/SFCS.1989.63531.
Wikipedia contributors. (2022b, November 22).Finger search. Wikipedia. https://en.wikipedia.org/wiki/Finger_search
Wikipedia contributors. (2022a, October 24).Threaded binary tree. Wikipedia. https://en.wikipedia.org/wiki/Threaded_binary_tree
Wikipedia contributors. (2023, January 4).Negative binomial distribution. Wikipedia. https://en.wikipedia.org/wiki/Negative_binomial_distribution
Redis contributors.Redis ordered set implementation. GitHub. https://github.com/redis/redis
确定性跳跃表
无锁并发安全跳跃表
往期推荐
这里有最新开源资讯、软件更新、技术干货等内容
点这里 ↓↓↓ 记得 关注✔ 标星⭐ 哦