记录每一次成长

哈希表是存储键值对元素的数据结构

Key（键） - 用于索引值的唯一整数
Value（值） - 与键相关联的数据

哈希表中的键和值

哈希（哈希函数）

在哈希表中，使用键来处理一个新的索引。并且，与该键对应的元素会存储在这个索引位置。这个过程就称为哈希（散列）

设k为一个键，h(x)为一个哈希函数。

在这里，h(k)会为我们提供一个新的索引，用于存储与键k相关联的元素。

欲知更多，访问哈希

哈希冲突

当哈希函数为多个键生成了相同的索引时，就会出现冲突（即该索引位置要存储哪个值）。这被称为哈希冲突。我们可以使用以下技术之一来解决哈希冲突。

链地址法解决冲突
开放地址法：线性探测 / 二次探测和双重哈希

1. 通过链地址法解决冲突

在链地址法中，如果哈希函数为多个元素生成了相同的索引，那么这些元素会通过一个双向链表存储在该相同的索引位置。如果j是多个元素对应的槽位，它会包含一个指向这些元素所构成链表表头的指针。如果该槽位不存在元素，那么j的值为 “空（NIL）”。

操作的伪代码

chainedHashSearch(T, k)
  return T[h(k)]
chainedHashInsert(T, x)
  T[h(x.key)] = x //insert at the head
chainedHashDelete(T, x)
  T[h(x.key)] = NIL

2. 开放地址法

与链地址法不同，开放地址法不会将多个元素存储在同一个槽位中。在这里，每个槽位要么被单个键值填充，要么保持为空（NIL）状态。

开放地址法中所使用的不同技术有：

i.线性探测

在线性探测中，通过检查下一个槽位来解决冲突。

h(k, i) = (h′(k) + i) mod m

其中

i={0,1,…}
h′(k)是一个新的哈希函数如果在h(k,0)处发生冲突，那么就检查h(k,1)。通过这种方式，i的值以线性方式递增。线性探测的问题在于会出现相邻槽位被填满形成聚集的情况。当插入一个新元素时，必须遍历整个聚集区域。这增加了在哈希表上执行操作所需的时间。

ii.二次探测

它的工作原理与线性探测类似，但通过使用以下关系，槽位之间的间距会增大（大于 1）

其中，

c1和c2是正的辅助常量，
i={0,1,…}

iii.双重哈希

如果在应用哈希函数h(k)后发生了冲突，那么就会计算另一个哈希函数来寻找下一个槽位。

h(k, i) = (h1(k) + ih2(k)) mod m

优秀的哈希函数

一个好的哈希函数可能无法完全避免冲突，但它能够减少冲突的数量。在这里，我们将研究不同的方法来找到一个好的哈希函数。

1. 除留余数法（除法散列法）

如果k是一个键，m是哈希表的大小，那么哈希函数h()的计算方式如下： h(k) = k mod m 例如，如果哈希表的大小m=10，且k=112，那么h(k) = 112 mod 10 = 2。m 的值不能是2的幂次方。这是因为2的幂次方的二进制形式是10、100、1000等等。当我们计算k mod m时，我们总是会得到低阶的p位。

if m = 22, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 01
if m = 23, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 001
if m = 24, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 0001
if m = 2p, then h(k) = p lower bits of m

2. 乘法散列法

3. 全域哈希

在全域哈希中，哈希函数是独立于键值随机选择的。

Python、Java 和 C/C++ 示例

PythonJavaCC++

Python

# Python program to demonstrate working of HashTable 

# Initialize the hash table with 10 empty lists (each index is a list to handle collisions)
hashTable = [[] for _ in range(10)]

def checkPrime(n):
    if n == 1 or n == 0:
        return 0

    for i in range(2, n // 2):
        if n % i == 0:
            return 0

    return 1


def getPrime(n):
    if n % 2 == 0:
        n = n + 1

    while not checkPrime(n):
        n += 2

    return n


def hashFunction(key):
    capacity = getPrime(10)
    return key % capacity


def insertData(key, data):
    index = hashFunction(key)
    # Check if the key already exists in the list to update it, otherwise append
    found = False
    for i, kv in enumerate(hashTable[index]):
        if kv[0] == key:
            hashTable[index][i] = (key, data)  # Update existing key-value pair
            found = True
            break
    if not found:
        hashTable[index].append((key, data))  # Add new key-value pair if not found


def removeData(key):
    index = hashFunction(key)
    # Remove the key-value pair from the list if it exists
    for i, kv in enumerate(hashTable[index]):
        if kv[0] == key:
            del hashTable[index][i]
            break


# Test the hash table
insertData(123, "apple")
insertData(432, "mango")
insertData(213, "banana")
insertData(654, "guava")
insertData(213, "orange")  # This should update the value for key 213

print(hashTable)

removeData(123)

print(hashTable)

Java

// Java program to demonstrate working of HashTable 

import java.util.*; 

class HashTable { 
  public static void main(String args[]) 
  {
  Hashtable<Integer, Integer> 
    ht = new Hashtable<Integer, Integer>(); 
  
  ht.put(123, 432); 
  ht.put(12, 2345);
  ht.put(15, 5643); 
  ht.put(3, 321);

  ht.remove(12);

  System.out.println(ht); 
  } 
}

// Implementing hash table in C

#include <stdio.h>
#include <stdlib.h>

struct set
{
  int key;
  int data;
};
struct set *array;
int capacity = 10;
int size = 0;

int hashFunction(int key)
{
  return (key % capacity);
}
int checkPrime(int n)
{
  int i;
  if (n == 1 || n == 0)
  {
  return 0;
  }
  for (i = 2; i < n / 2; i++)
  {
  if (n % i == 0)
  {
    return 0;
  }
  }
  return 1;
}
int getPrime(int n)
{
  if (n % 2 == 0)
  {
  n++;
  }
  while (!checkPrime(n))
  {
  n += 2;
  }
  return n;
}
void init_array()
{
  capacity = getPrime(capacity);
  array = (struct set *)malloc(capacity * sizeof(struct set));
  for (int i = 0; i < capacity; i++)
  {
  array[i].key = 0;
  array[i].data = 0;
  }
}

void insert(int key, int data)
{
  int index = hashFunction(key);
  if (array[index].data == 0)
  {
  array[index].key = key;
  array[index].data = data;
  size++;
  printf("\n Key (%d) has been inserted \n", key);
  }
  else if (array[index].key == key)
  {
  array[index].data = data;
  }
  else
  {
  printf("\n Collision occured  \n");
  }
}

void remove_element(int key)
{
  int index = hashFunction(key);
  if (array[index].data == 0)
  {
  printf("\n This key does not exist \n");
  }
  else
  {
  array[index].key = 0;
  array[index].data = 0;
  size--;
  printf("\n Key (%d) has been removed \n", key);
  }
}
void display()
{
  int i;
  for (i = 0; i < capacity; i++)
  {
  if (array[i].data == 0)
  {
    printf("\n array[%d]: / ", i);
  }
  else
  {
    printf("\n key: %d array[%d]: %d \t", array[i].key, i, array[i].data);
  }
  }
}

int size_of_hashtable()
{
  return size;
}

int main()
{
  int choice, key, data, n;
  int c = 0;
  init_array();

  do
  {
  printf("1.Insert item in the Hash Table"
     "\n2.Remove item from the Hash Table"
     "\n3.Check the size of Hash Table"
     "\n4.Display a Hash Table"
     "\n\n Please enter your choice: ");

  scanf("%d", &choice);
  switch (choice)
  {
  case 1:

    printf("Enter key -:\t");
    scanf("%d", &key);
    printf("Enter data -:\t");
    scanf("%d", &data);
    insert(key, data);

    break;

  case 2:

    printf("Enter the key to delete-:");
    scanf("%d", &key);
    remove_element(key);

    break;

  case 3:

    n = size_of_hashtable();
    printf("Size of Hash Table is-:%d\n", n);

    break;

  case 4:

    display();

    break;

  default:

    printf("Invalid Input\n");
  }

  printf("\nDo you want to continue (press 1 for yes): ");
  scanf("%d", &c);

  } while (c == 1);
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171

C++

// Implementing hash table in C++

#include <iostream>
#include <list>
using namespace std;

class HashTable
{
  int capacity;
  list<int> *table;

public:
  HashTable(int V);
  void insertItem(int key, int data);
  void deleteItem(int key);

  int checkPrime(int n)
  {
  int i;
  if (n == 1 || n == 0)
  {
    return 0;
  }
  for (i = 2; i < n / 2; i++)
  {
    if (n % i == 0)
    {
    return 0;
    }
  }
  return 1;
  }
  int getPrime(int n)
  {
  if (n % 2 == 0)
  {
    n++;
  }
  while (!checkPrime(n))
  {
    n += 2;
  }
  return n;
  }

  int hashFunction(int key)
  {
  return (key % capacity);
  }
  void displayHash();
};
HashTable::HashTable(int c)
{
  int size = getPrime(c);
  this->capacity = size;
  table = new list<int>[capacity];
}
void HashTable::insertItem(int key, int data)
{
  int index = hashFunction(key);
  table[index].push_back(data);
}

void HashTable::deleteItem(int key)
{
  int index = hashFunction(key);

  list<int>::iterator i;
  for (i = table[index].begin();
   i != table[index].end(); i++)
  {
  if (*i == key)
    break;
  }

  if (i != table[index].end())
  table[index].erase(i);
}

void HashTable::displayHash()
{
  for (int i = 0; i < capacity; i++)
  {
  cout << "table[" << i << "]";
  for (auto x : table[i])
    cout << " --> " << x;
  cout << endl;
  }
}

int main()
{
  int key[] = {231, 321, 212, 321, 433, 262};
  int data[] = {123, 432, 523, 43, 423, 111};
  int size = sizeof(key) / sizeof(key[0]);

  HashTable h(size);

  for (int i = 0; i < size; i++)
  h.insertItem(key[i], data[i]);

  h.deleteItem(12);
  h.displayHash();
}

哈希表的应用

哈希表在以下这些情况下会被使用：

需要在常量时间内进行查找和插入操作的场景。
密码学应用领域。
需要对数据进行索引的场景。

哈希（哈希函数） ​

哈希冲突 ​

1. 通过链地址法解决冲突 ​

2. 开放地址法 ​

i.线性探测 ​

ii.二次探测 ​

iii.双重哈希 ​

优秀的哈希函数 ​

1. 除留余数法（除法散列法） ​

2. 乘法散列法 ​

3. 全域哈希 ​

Python、Java 和 C/C++ 示例 ​

哈希表的应用 ​