当前位置：首页 > news >正文

[C++]unordered_map和unordered_set的模拟实现

news 2025/7/13 4:31:36

哈希散列

散列表

散列表是一种通过哈希函数将键映射到数组索引的数据结构，利用哈希函数将键值转化成索引，然后可以通过索引快速确定数据所在位置。

哈希函数

首先我们先来见见哈希函数，哈希函数就是为了将键值转化成索引。这里对字符串类型进行特化处理，为的是降低字符串转化索引的重复率以减少冲突的现象。

// 哈希函数采用除留余数法
template <class K>
struct HashFunc
{size_t operator()(const K &key){return (size_t)key;}
};// 哈希表中支持字符串的操作(特化，解决string类型的转换)
template <>
struct HashFunc<std::string>
{size_t operator()(const std::string &key){size_t hash = 0;for (auto e : key){hash *= 31;hash += e;}return hash;}
};

使用时直接构建Hash对象，用对象调用operator()即可。

template <class K, class V, class Hash = HashFunc<K>>class HashTable

哈希表

哈希表有很多种实现方式，这里以开放地址法（直接定址法）为例。

开放地址法实现

直接上例子，在一个vector容器中，其下标就是索引。我们依次插入12、9、1，这些数据经过哈希函数变成索引再寻找合适的位置插入（注：这三个数据处理后仍为12、9、1）。当前有8个位置，我们只需要拿着索引%容量就可以得到位置。

12%8=4于是12就放在下标为4的位置。9同理放在下标为1的位置，此时数据1进来发现位置被9占用了，1就向后找空位进行插入。查找时通过索引找到的数据不一致同样向后查询。

总结：数据经过哈希函数转换成索引，再用索引容量得到插入的下标，如果出现位置被占用（简称冲突）就向后依次查找，找到空位就插入。

哈希表代码框架

//描述当前位置的状态enum State{EXIST,//存在EMPTY,//空（一直没有存储过数据）DELETE//被删除（曾经有存储过数据）};template <class K, class V>struct HashData{std::pair<K, V> _kv;//键值对State _state = EMPTY;//状态};template <class K, class V, class Hash = HashFunc<K>>class HashTable{public://一个构造函数HashTable(){}//一个插入函数bool Insert(const std::pair<K, V> &kv){}//一个查找函数HashData<K, V> *Find(const K &key){}//一个删除函数bool Erase(const K &key){}private:std::vector<HashData<K, V>> _tables;size_t _n = 0; // 表中存储数据个数};
}

哈希表插入与扩容

哈希表不会等到存储满才扩容，一般空间占用率达到70%就需要扩容(要兼容冲突和空间利用率)。因为哈希散列最大优点是以极快的速度（O(1)）去查找数据，但当数据越来越多冲突也会越来越多，冲突过多的时候查找的效率就会下降，所以哈希本质上也是空间换时间。

bool Insert(const std::pair<K, V> &kv){if (Find(kv.first))return false;//扩容逻辑if (_n * 10 / _tables.size() >= 7){int newsize = _tables.size() * 2;HashTable<K, V, Hash> newHT;newHT._tables.resize(newsize);for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]._state = EXIST){newHT.Insert(_tables[i]._kv);}}_tables.swap(newHT._tables);}//如果冲突，则向后查找Hash hs;size_t hashi = hs(kv.first) % _tables.size();while (_tables[hashi]._state == EXIST){++hashi;hashi %= _tables.size();}//找到合适位置插入并改变状态_tables[hashi]._kv = kv;_tables[hashi]._state = EXIST;++_n;return true;}

扩容逻辑：

为什么不在原哈希表上扩容？

因为一旦扩容，容量大小改变导致插入位置出现变化，数据需要重新寻找合适位置。（位置=索引%容量）

哈希表的查找

“删除”状态存在的意义：

HashData<K, V> *Find(const K &key){//拿到索引Hash hs;size_t hashi = hs(key) % _tables.size();//非空则继续查找while (_tables[hashi]._state != EMPTY){//找到了则返回if (_tables[hashi]._state == EXIST &&_tables[hashi]._kv.first == key){return &_tables[hashi];}++hashi;hashi %= _tables.size();}return nullptr;}

完整代码

#pragma once
#include <iostream>
#include <vector>// 哈希函数采用除留余数法
template <class K>
struct HashFunc
{size_t operator()(const K &key){return (size_t)key;}
};// 哈希表中支持字符串的操作(特化，解决string类型的转换)
template <>
struct HashFunc<std::string>
{size_t operator()(const std::string &key){size_t hash = 0;for (auto e : key){hash *= 31;hash += e;}return hash;}
};// 以下采用开放定址法，即线性探测解决冲突
namespace open_address
{//描述当前位置的状态enum State{EXIST,//存在EMPTY,//空（一直没有存储过数据）DELETE//被删除（曾经有存储过数据）};template <class K, class V>struct HashData{std::pair<K, V> _kv;State _state = EMPTY;};template <class K, class V, class Hash = HashFunc<K>>class HashTable{public:HashTable(){_tables.resize(10);}bool Insert(const std::pair<K, V> &kv){if (Find(kv.first))return false;if (_n * 10 / _tables.size() >= 7){int newsize = _tables.size() * 2;HashTable<K, V, Hash> newHT;newHT._tables.resize(newsize);for (size_t i = 0; i < _tables.size(); i++){if (_tables[i]._state = EXIST){newHT.Insert(_tables[i]._kv);}}_tables.swap(newHT._tables);}Hash hs;size_t hashi = hs(kv.first) % _tables.size();while (_tables[hashi]._state == EXIST){++hashi;hashi %= _tables.size();}_tables[hashi]._kv = kv;_tables[hashi]._state = EXIST;++_n;return true;}HashData<K, V> *Find(const K &key){Hash hs;size_t hashi = hs(key) % _tables.size();while (_tables[hashi]._state != EMPTY){if (_tables[hashi]._state == EXIST &&_tables[hashi]._kv.first == key){return &_tables[hashi];}++hashi;hashi %= _tables.size();}return nullptr;}bool Erase(const K &key){HashData<K, V> *ret = Find(key);if (ret){ret->_state = DELETE;--_n;return true;}else{return false;}}private:std::vector<HashData<K, V>> _tables;size_t _n = 0; // 表中存储数据个数};
}

哈希桶

哈希桶结构

与哈希表的开发地址法类似，但解决冲突的方法不同。同样的是数据经过哈希函数转换成索引，再用索引容量得到插入的下标，不同的是哈希桶以链表的形式存储，直接挂接到对应位置的最后面。

同样按照顺序依次插入12、9、1，在哈希桶的结构下如图：

哈希桶代码框架

namespace hash_bucket
{template <class T>struct HashNode{T _data;HashNode<T> *_next;HashNode(const T &data): _data(data), _next(nullptr){}};template <class K, class T, class KeyOfT, class Hash>class HashTable{typedef HashNode<T> Node;public:// 内部类template <class Ptr, class Ref>struct __HTIterator{typedef HashNode<T> Node;typedef __HTIterator Self;Node *_node;//指向数据节点const HashTable *_pht;//指向对应哈希桶（是谁的迭代器就指向那个哈希桶）__HTIterator(Node *node, const HashTable *pht): _node(node), _pht(pht){}Ref operator*(){}Ptr operator->(){}Self &operator++(){}bool operator!=(const Self &s){}};typedef __HTIterator<T *, T &> iterator;typedef __HTIterator<const T *, const T &> const_iterator;iterator begin(){}iterator end(){}const_iterator begin() const{}const_iterator end() const{}HashTable(){}~HashTable(){}std::pair<iterator, bool> Insert(const T &data){}iterator Find(const K &key){}bool Erase(const K &key){}size_t size() const{}bool empty() const{}private:std::vector<Node *> _tables; // 指针数组size_t _n;};
}

哈希桶的插入与扩容

首先利用查找函数确认要插入的值是否已存在。

扩容逻辑：

当数据达到一定量（看实际需求而定）就应该扩容，因为如果一条链表上节点过多会导致查找效率的下降，扩容可以有效地将节点分散到多个链表中，以提升查找效率。

为什么不像哈希表实现一样复用insert函数？因为节点可以复用，比起删除节点再创建节点，直接改变指向更加节省资源，所以这里采用改变指向而不是复用insert。

（1）初始状态：

（2）在新对象中的数组找到合适位置并进行头插：

（3）遍历整个链表重复头插操作，最后切断链表原首节点与原对象的联系：

（4）然后继续遍历数组，找到链表后重复1、2、3操作，全遍历完成后交换两个对象（与哈希表交换对象原理一致）。

iterator Find(const K &key){KeyOfT kot;Hash hs;size_t hashi = hs(key) % _tables.size();Node *cur = _tables[hashi];while (cur){if (kot(cur->_data) == key){return iterator(cur, this);}cur = cur->_next;}return iterator(nullptr, this);}std::pair<iterator, bool> Insert(const T &data){KeyOfT kot;//利用查找函数确认要插入数据是否存在，存在则直接返回位置iterator it = Find(kot(data));if (it != end())return std::make_pair(it, false);//扩容逻辑Hash hs;if (_n == _tables.size()){int newsize = _tables.size() * 2;HashTable<K, T, KeyOfT, Hash> newHashTable;newHashTable._tables.resize(newsize, nullptr);for (int i = 0; i < _tables.size(); i++){Node *cur = _tables[i];while (cur){Node *next = cur->_next;size_t hashi = hs(kot(cur->_data)) % newHashTable._tables.size();cur->_next = newHashTable._tables[hashi];newHashTable._tables[hashi] = cur;cur = next;}_tables[i] = nullptr;}_tables.swap(newHashTable._tables);}//找到合适位置，进行头插size_t hashi = hs(kot(data)) % _tables.size();Node *newnode = new Node(data);newnode->_next = _tables[hashi];_tables[hashi] = newnode;++_n;return std::make_pair(iterator(newnode, this), true);}

operator++逻辑

Self &operator++(){//若当前迭代器指向的节点下一个不为空继续向下遍历即可。if (_node->_next){_node = _node->_next;}//当前节点是当前链表最后一个节点，所以该去寻找下一个链表了else{KeyOfT kot;Hash hs;//通过当前节点数据的键值确定当前链表在数组中的位置size_t hashi = hs(kot(_node->_data)) % _pht->_tables.size();//先进行++，因为要从下一个位置开始查找是否还有链表++hashi;//遍历数组，退出的情况有两个：//（1）查找到下一个链表//（2）超出查找范围退出，即没找到for (; hashi < _pht->_tables.size(); hashi++){if (_pht->_tables[hashi])break;}//如果hashi为数组大小，则证明是因为原因（2）退出遍历//即没有找到下一个链表，也就是说已经遍历完成了所以元素，返回nullptrif (hashi == _pht->_tables.size()){_node = nullptr;}//反之则是找到了下一个链表，将链表首节点交给_node维护起来即可else{_node = _pht->_tables[hashi];}}return *this;}

完整代码

#pragma once
#include <iostream>
#include <vector>// 哈希函数采用除留余数法
template <class K>
struct HashFunc
{size_t operator()(const K &key){return (size_t)key;}
};// 哈希表中支持字符串的操作(特化，解决string类型的转换)
template <>
struct HashFunc<std::string>
{size_t operator()(const std::string &key){size_t hash = 0;for (auto e : key){hash *= 31;hash += e;}return hash;}
};namespace hash_bucket
{template <class T>struct HashNode{T _data;HashNode<T> *_next;HashNode(const T &data): _data(data), _next(nullptr){}};template <class K, class T, class KeyOfT, class Hash>class HashTable{typedef HashNode<T> Node;public:// 内部类template <class Ptr, class Ref>struct __HTIterator{typedef HashNode<T> Node;typedef __HTIterator Self;Node *_node;const HashTable *_pht;__HTIterator(Node *node, const HashTable *pht): _node(node), _pht(pht){}Ref operator*(){return _node->_data;}Ptr operator->(){return &_node->_data;}Self &operator++(){if (_node->_next){_node = _node->_next;}else{KeyOfT kot;Hash hs;size_t hashi = hs(kot(_node->_data)) % _pht->_tables.size();++hashi;for (; hashi < _pht->_tables.size(); hashi++){if (_pht->_tables[hashi])break;}if (hashi == _pht->_tables.size()){_node = nullptr;}else{_node = _pht->_tables[hashi];}}return *this;}bool operator!=(const Self &s){return _node != s._node;}};typedef __HTIterator<T *, T &> iterator;typedef __HTIterator<const T *, const T &> const_iterator;iterator begin(){for (int i = 0; i < _tables.size(); i++){if (_tables[i]){return iterator(_tables[i], this);}}return end();}iterator end(){return iterator(nullptr, this);}const_iterator begin() const{for (int i = 0; i < _tables.size(); i++){if (_tables[i]){return const_iterator(_tables[i], this);}}return end();}const_iterator end() const{return const_iterator(nullptr, this);}HashTable(){_tables.resize(10, nullptr);_n = 0;}~HashTable(){for (int i = 0; i < _tables.size(); i++){Node *cur = _tables[i];while (cur){Node *next = cur->_next;delete cur;cur = next;}_tables[i] = nullptr;}}std::pair<iterator, bool> Insert(const T &data){KeyOfT kot;iterator it = Find(kot(data));if (it != end())return std::make_pair(it, false);Hash hs;if (_n == _tables.size()){int newsize = _tables.size() * 2;HashTable<K, T, KeyOfT, Hash> newHashTable;newHashTable._tables.resize(newsize, nullptr);for (int i = 0; i < _tables.size(); i++){Node *cur = _tables[i];while (cur){Node *next = cur->_next;size_t hashi = hs(kot(cur->_data)) % newHashTable._tables.size();cur->_next = newHashTable._tables[hashi];newHashTable._tables[hashi] = cur;cur = next;}_tables[i] = nullptr;}_tables.swap(newHashTable._tables);}size_t hashi = hs(kot(data)) % _tables.size();Node *newnode = new Node(data);newnode->_next = _tables[hashi];_tables[hashi] = newnode;++_n;return std::make_pair(iterator(newnode, this), true);}iterator Find(const K &key){KeyOfT kot;Hash hs;size_t hashi = hs(key) % _tables.size();Node *cur = _tables[hashi];while (cur){if (kot(cur->_data) == key){return iterator(cur, this);}cur = cur->_next;}return iterator(nullptr, this);}bool Erase(const K &key){KeyOfT kot;Hash hs;size_t hashi = hs(key) % _tables.size();Node *prev = nullptr;Node *cur = _tables[hashi];while (cur){if (kot(cur->_data) == key){if (prev == nullptr){_tables[hashi] = cur->_next;}else{prev->_next = cur->_next;}delete cur;return true;}else{prev = cur;cur = cur->_next;}}return false;}size_t size() const{size_t cnt = 0;for (iterator i = begin(); i != end(); ++i){++cnt;}return cnt;}bool empty() const{return size() == 0;}private:std::vector<Node *> _tables; // 指针数组size_t _n;};
}

仿函数KeyOfT

这两个仿函数分别封装在unordered_map和unordered_set中，位于哈希桶的上层，用于提取键值。

为什么该仿函数在哈希桶的上层，因为哈希桶这个结构并不知道传入的类型是unordered_map还是unordered_set导致无法确定如何取得键值，但要使用哈希桶的结构知道自己要用什么类型自然也就知道如何取得键值所以该仿函数在其上层。

//unordered_map的仿函数
struct MapKeyOfT
{const K &operator()(const std::pair<K, V> &kv){return kv.first;}
};//unordered_set的仿函数
struct SetKeyOfT
{const K &operator()(const K &key){return key;}
};

unordered_map的模拟实现

除去仿函数外均为对哈希桶的接口调用，不过多赘述。

#include "HashTable.hpp"namespace Myunordered_map
{template <class K, class V, class Hash = HashFunc<K>>class unordered_map{struct MapKeyOfT{const K &operator()(const std::pair<K, V> &kv){return kv.first;}};public:typedef typename hash_bucket::HashTable<K, std::pair<const K, V>, MapKeyOfT, Hash>::iterator iterator;iterator begin(){return _ht.begin();}iterator end(){return _ht.end();}V &operator[](const K &key){//iterator指向一个pair<K,V>类型std::pair<iterator, bool> ret = insert(std::make_pair(key, V()));return ret.first->second;}std::pair<iterator, bool> insert(const std::pair<K, V> &kv){return _ht.Insert(kv);}iterator find(const std::pair<K, V> &kv){MapKeyOfT kot;return _ht.Find(kot(kv));}bool erase(const std::pair<K, V> &kv){MapKeyOfT kot;return _ht.Erase(kot(kv));}private:hash_bucket::HashTable<K, std::pair<const K, V>, MapKeyOfT, Hash> _ht;};
}

unordered_set的模拟实现

除去仿函数外均为对哈希桶的接口调用，不过多赘述。

#include "HashTable.hpp"namespace Myunordered_set
{template <class K, class Hash = HashFunc<K>>class unordered_set{struct SetKeyOfT{const K &operator()(const K &key){return key;}};public:typedef typename hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>::iterator iterator;typedef typename hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>::const_iterator const_iterator;iterator begin(){return _ht.begin();}iterator end(){return _ht.end();}const_iterator begin() const{return _ht.begin();}const_iterator end() const{return _ht.end();}std::pair<iterator, bool> insert(const K &key){return _ht.Insert(key);}iterator find(const K &key){return _ht.Find(key);}bool erase(const K &key){return _ht.Erase(key);}size_t size() const{return _ht.size();}bool empty() const{return _ht.empty();}private:hash_bucket::HashTable<K, const K, SetKeyOfT, Hash> _ht;};
}

查看全文

http://www.mrgr.cn/news/62999.html