在这种情况下是否可以制作最小的完美哈希函数?

我想创建一个哈希映射(或其他结构,如果你有任何建议)来存储键值对。 在创建地图的同时,所有键都将同时插入,但在运行时,我需要创建地图时,我不知道键是什么(任意长度的字符串)。

我正在解析一个像这样的查询字符串"x=100&name=bob&color=red&y=150" (但字符串可以有无限数量的变量,变量可以有任意长度的名称)。

我想解析它一次并创建一个哈希映射,最好是最小的并具有完美的哈希函数以满足线性存储要求。 一旦创建了地图,就不会修改或删除这些值,也不会再向地图添加任何键值对,因此整个地图实际上是一个常量。 我假设变量不会在字符串中出现两次(IE。 "x=1&x=2"无效)。

我在C编码,目前有一个我可以使用的函数,如get("x")将返回字符串"100" ,但它每次解析查询字符串需要O(n)时间。 我想在它第一次加载时解析它,因为它是一个非常大的查询字符串,每个值都会被读取几次。 即使我使用C ,我也不需要C中的代码作为答案。 伪代码或任何建议都很棒!

尝试使用GPL的gperf或Bob Jenkins在C中的公共域实现

程序:

  • 接收查询字符串并通过枚举键列表来识别完美散列函数的域

  • 将这些键和列表大小(范围将为1..size)提供给从上述参考实现派生的完美哈希生成函数

  • 使用生成的完美哈希函数来创建HashMap

  • 使用相同的完美哈希函数来处理HashMap中的get请求

编辑 Necrolis在下面的评论中指出,参考实现在C源代码中输出完美的散列函数,因此您需要修改它们以生成类似于VM的字节码。 您还可以使用嵌入式Scheme或Lua等解释性语言。

当创建完美哈希函数的开销在查找上分摊时,知道这是否值得在简单(非完美)HashMap上花费这将是有趣的。

另一种选择是Cuckoo散列 ,它也有O(1)查找

有一些非常好的哈希例程; 然而,certificate其中一个接近完美需要大量的输入知识。 看来您的输入不受限制,足以使这样的证据几乎不可能。

一般来说,完美(或接近完美)的例程对输入的每个位/字节都很敏感。 对于速度,组合操作通常是异或。 这种例程阻止两个相同字节相互抵消的方式是移位或旋转这些位。 然而,这种转移应该通过一个数字来完成,该数字是可以表示的最大数量的相对素数; 否则,输入中的模式可能会被先前的输入部分取消。 这减少了解决方案中的熵,增加了碰撞的机会。

典型的解决方案是

 Start with a number that is a prime (all primes are relative primes) while (more bytes to be considered) { take the next byte of input and multiply it by a second prime determine the number of bits that might be lost in a left shift, capture them in a buffer shift the bits in the hash "buffer" to the left. restore the high order bit(s) in the low position take the next byte of input and multiply it by a second prime mask the multiplied result into the buffer } 

这种例程的问题是已知的。 基本上输入没有变化,这使得输入的分散非理想。 也就是说,只要有足够的输入偏离初始的主要起始编号,这种技术就可以在输出的整个域中实现良好的输入位分散。 不幸的是,选择随机起始编号不是解决方案,因为不可能准确地重新计算散列。

在任何情况下,乘法中使用的素数不应超过乘法。 同样,如果要避免丢失初始输入的色散效应(并且结果仅围绕后一位/字节分组),则必须以低位替换高位的捕获。 素数选择会影响色散,有时需要进行调谐以获得良好的效果。

到目前为止,您应该能够轻松地看到近乎完美的哈希比一个不太接近完美的哈希需要更多的计算时间。 散列算法旨在解决冲突,并且大多数Java散列结构在占用阈值时resize(通常在70%范围内,但它是可调的)。 由于resize是内置的,只要您不编写可怕的哈希值,Java数据结构将继续重新调整您的碰撞机会。

可以加速散列的优化包括计算比特组,丢弃偶尔的字节,预先计算常用乘法数字的查找表(由输入索引)等。不要假设优化更快,取决于架构,机器细节和优化的“年龄”,有时优化的假设不再成立,应用优化实际上增加了计算散列的时间。

在你所描述的内容中,没有完美的哈希。 完美的哈希将是原始输入。 如果你保证你的数据只是某些东西(比如基于拉丁语的ASCII或只有某些键)那么你可以很好地散列,但是完美吗? 不,不可能。 您还必须创建链接列表或向量散列未命中机制。 系统中的任何变量(如您的情况下的输入计数)都将使完美的散列概念无效。

你想要什么违反了数学定律。

你可以在O(1)附近达到目标,但这里有一些悬而未决的问题。 问题是:

  1. 为什么它必须是线性存储?
  2. 表中的删除是否通用(您只指定在初始创建后不会添加键值对)?
  3. 与散列范围相比,表可能有多大?
  4. 与重复数据相比,插入的频率如何?
  5. 记忆是一个重要因素吗?

虽然不可能实现完美的哈希,但如果你只是简单地拥有一个简单的链表,其桶大小至少比你潜在的独特哈希的平均值有两个标准差,那么它就变得完全学术化了。 它是最小的内存(当然相对而言,取决于总的潜在大小),删除友好,并且只要问题3的回答类似于“远小”,将几乎是 O(1)查找时间。

以下内容应该让您入门,但我将决定使用哪种哈希算法…

 #include  #include  #include  // Dummy value type to test compile. Replace throughout #define YOUR_VALUE_T int // See below where the charmap is //#define HTABLE_USE_CHARMAP // Maintain a true linked list that's manageable and iterateable #define HTABLE_MAINTAIN_LIST // Count lookup misses and such #define HTABLE_KEEP_STATS // Fast deletion = faster deletion but more memory consumption //#define HTABLE_FAST_DELETES #ifdef HTABLE_USE_CHARMAP // This is used to quickly collapse the input from full 8-bit to the minimal character set of truely expected data. // The idea here is to boil down the data. This should only be done if you're confident enough to develop a custom // hashing algorithm for this particular known range const char hashing_charmap[256] = { // Each data point that is unused (such as control characters or high 8-bit characters) // should be 0, while each used character should be represented with a unique sequential value (1, 2, 3, etc) // I'm not going to build this for you because it's very custom to your needs. // A chunk might look look like... /* 0, 0, 0, 0, 17, 18, 19, 0, 0, 20, 21, */ }; #endif static inline uint32_t hash_str(register const char* s, const size_t len) { register uint32_t hash = 5381; // hash seed here. This could be different depending on the actual algorithm chosen register char symbol; // This could be unrolled because we known string length as well. for (register size_t i=0; i < len; i++) { #ifdef HTABLE_USE_CHARMAP if (!(symbol = hash_charmap[s[i]])) continue; #else // Actually s[i] could simply be used (which would be faster) if no mapping is needed. symbol = s[i]; #endif // True hash algorithm per-symbol operation here /* Keep in mind that certain algorithms are optimized for certain things. An example: Stock DJBX33A is very fast but effectively only represents the end of a long input. It's really meant for short inputs (like variable names) A MurmurHash or tuned FNV variant are likely to be a good picks since we've reduced symbol range and we are dealing with potential long inputs. It's also important to understand that the entire hash will likely not be used. Only the lower-end bits will be used (you'll see why in the actual functionality). If you're hashing algorithm is good though, this shouldn't matter because the distribution should be normal. I'll just use Jenkins one-at-a-time hash here (because it's easy) */ hash += symbol; hash += (hash << 10); hash ^= (hash >> 6); } // Finialize jenkins one-at-a-time hash += (hash << 3); hash ^= (hash >> 11); hash += (hash << 15); return hash; }; typedef struct _hash_entry { char* key; size_t key_len; uint32_t hash; // Whatever your value type is (likely a pointer to your own record or something) YOUR_VALUE_T value; // Internal linking maintains order. // If you don't need proper order maintentence, you don't need these #ifdef HTABLE_MAINTAIN_LIST struct _hash_entry* prev; struct _hash_entry* next; #endif #ifdef HTABLE_FAST_DELETES struct _hash_entry* bucket_prev; #endif // This is required for the occassional hash miss struct _hash_entry* bucket_next; } hash_entry_t; typedef struct _hash_table { // Counts size_t entry_count; uint32_t bucket_count; unsigned int growth_num; unsigned int growth_den; #ifdef HTABLE_KEEP_STATS // How many times we missed during lookup size_t misses; // (entry_count - used_buckets) tells you how many collisions there are (the lower the better) uint32_t used_buckets; #endif // Internal linking. Same conditions as in hash_entry_t so feel free to remove as necessary. #ifdef HTABLE_MAINTAIN_LIST hash_entry_t* first; hash_entry_t* last; #endif // Buckets, the soul of the hash table uint32_t hash_mask; hash_entry_t** buckets; } hash_table_t; // Creates a hash table // size_hint - Tells to table how many buckets it should initially allocate. // If you know (for example) that you'll have about 500 entries, set it // to 500 // growth_num and growth_den - This is the ratio of how many entries to how // many buckets that you want to guarantee. // It's in two integers to avoid floating point math for speed. // The logic after an insertion is... // if (entry_count == growth_num * (bucket_count / growth_den)) then // grow the bucket array // For example, when growth_num is 4 and growth_den is 5... // (entry_count == 4 * (bucket_count / 5)) // ...would be true when entry count is 80% of the bucket count // This can result in a greater than 1.0 ratio (such as 5/4 or something // like that) if you prefer. This would mean that there are less buckets // than there are entries, so collisions are guaranteed at that point, but // you would save on both memory and often a bucket expansion occurs (which // is costly during an insert operation). static hash_table_t* htable_create(const size_t size_hint, const unsigned int growth_num, const unsigned int growth_den); // Frees a hash table static void htable_free(hash_table_t* table); // Mostly used internally. You probably want htable_get(), htable_value(), or htable_exists() static hash_entry_t* htable_find_entry(hash_table_t* table, const char* key, size_t key_len, uint32_t* hash, size_t* true_len); // Get the pointer to a value stored in the table (or NULL on non-existant) static YOUR_VALUE_T* htable_value(const hash_table_t* table, const char* key, size_t key_len); // Get the value of an entry, or the default value if the entry doesn't exist static YOUR_VALUE_T htable_get(const hash_table_t* table, const char* key, size_t key_len, const YOUR_VALUE_T default_value); // Test for the existance of a value static int htable_exists(const hash_table_t* table, const char* key, size_t key_len); // Add a new entry (but don't update if it already exists). Returns NULL if it already exists static hash_entry_t* htable_add(hash_table_t* table, const char* key, size_t key_len, YOUR_VALUE_T value); // Update an entry OR add aa new entry it doesn't already exist static hash_entry_t* htable_set(hash_table_t* table, const char* key, size_t key_len, YOUR_VALUE_T value); // Update an entry but don't add aa new entry it doesn't already exist. Returns NULL if doesn't exist static hash_entry_t* htable_update(hash_table_t* table, const char* key, size_t key_len, YOUR_VALUE_T value); // Delete an entry. Returns 1 on success or 0 if the entry didn't exist static int htable_delete(hash_table_t* table, const char* key, size_t key_len); // Pack the table. // This is here because... // - If HTABLE_FAST_DELETES is set, and if you delete a bunch of entries, it's // possible that you can free up some memory by shrinking the bucket array. // You would have to call this manually to make that happen. // - If HTABLE_FAST_DELETES is NOT set however, this get's called automatically // on each delete, so the buckets are guaranteed to be packed. static void htable_pack(hash_table_t* table); /*********************************\ Implementation... \*********************************/ static hash_table_t* htable_create(const unsigned long size_hint, const unsigned int growth_num, const unsigned int growth_den) { hash_table_t* res = malloc(sizeof(hash_table_t)); if (!res) return NULL; res->entry_count = 0; #ifdef HTABLE_MAINTAIN_LIST res->first = NULL; res->last = NULL; #endif #ifdef HTABLE_KEEP_STATS res->misses = 0; res->used_buckets = 0; #endif if ((!growth_num) || (!growth_den)) { // Grow only when the entry count matches the bucket count res->growth_num = 1; res->growth_den = 1; } else { res->growth_num = growth_num; res->growth_den = growth_den; } /* For computational speed and simplicity we'll grow the bucket array exponentially. Not growing the buckets exponentially is possible but requires a different entry lookup mechanism (because hash & hash_mask would no longer work) and would likely involve the modulas operator which is very slow. If memory is uber important however, this might be a good solution. */ // We'll go ahead and assume it's a reasonably small table and only allocate 256 buckets. int bits = 8; if (size_hint) { unsigned long target = (size_hint * res->growth_den) / res->growth_num; // First check is to prevent overflow as it would be 0 when bits is 31 on a 32 bit system while ((1 << (bits + 1)) && ((1 << bits) < target)) bits++; } res->bucket_count = 1 << bits; res->hash_mask = (1 << bits) - 1; if ((res->buckets = (hash_entry_t**)calloc(res->bucket_count, sizeof(hash_entry_t*))) == NULL) { free(res); return NULL; } memset(res->buckets, 0, sizeof(hash_entry_t*) * res->bucket_count); return res; }; // Destroy a table static void htable_free(hash_table_t* table) { hash_entry_t* entry; hash_entry_t* next; #ifdef HTABLE_MAINTAIN_LIST entry = table->first; while (entry) { next = entry->next; free(entry->key); free(entry); entry = next; } #else for (uint32_t i=0; i < table->bucket_count; i++) { entry = table->buckets[i]; while (entry) { next = entry->bucket_next; free(entry->key); free(entry); entry = next; } } #endif free(table->buckets); free(table); } // Find an entry: (mostly used internally) // returns NULL when the entry isn't found static hash_entry_t* htable_find_entry(hash_table_t* table, const char* key, size_t key_len, uint32_t* hash, size_t* true_len) { if (!key_len) key_len = strlen(key); if (true_len != NULL) *true_len = key_len; uint32_t h = hash_str(key, key_len); if (hash != NULL) *hash = h; uint32_t bucket = h & table->hash_mask; // Best case is here is O(1) because table->buckets[bucket] would be the entry hash_entry_t* entry = table->buckets[bucket]; // ... but if we miss, then the time increases to as much as O(n) where n is the number of entries in // the particular bucket (good hash + good ratio management means that n would usually be only 1) while ((entry) && ((entry->hash != h) || (entry->key_len != key_len) || (memcmp(entry->key, key, key_len)))) { #ifdef HTABLE_KEEP_STATS table->misses++; #endif entry = entry->bucket_next; } return entry; } // Insertion of entry into bucket. Used internally static inline int _htable_bucket_insert(hash_entry_t** buckets, hash_entry_t* entry, const uint32_t hash_mask) { hash_entry_t* bentry; #ifdef HTABLE_FAST_DELETES entry->bucket_prev = NULL; #endif entry->bucket_next = NULL; uint32_t bidx = entry->hash & hash_mask; int res = 0; if ((bentry = buckets[bidx]) == NULL) { res = 1; buckets[bidx] = entry; } else { while (bentry->bucket_next) bentry = bentry->bucket_next; bentry->bucket_next = entry; #ifdef HTABLE_FAST_DELETES entry->bucket_prev = bentry; #endif } return res; } // Bucket array growing/shrinking. Used internally static void _htable_adjust_as_needed(hash_table_t* table) { int change = (((table->bucket_count << 1) != 0) && (table->entry_count >= table->growth_num * (table->bucket_count / table->growth_den))); if (!change) { if ((table->bucket_count > (1 << 8)) && (table->entry_count < table->growth_num * ((table->bucket_count >> 1) / table->growth_den))) { change = -1; } else { return; } } uint32_t new_bucket_count = (change < 0) ? table->bucket_count >> 1 : table->bucket_count << 1; uint32_t new_hash_mask = new_bucket_count - 1; hash_entry_t** new_buckets = (hash_entry_t**)calloc(new_bucket_count, sizeof(hash_entry_t*)); if (!new_buckets) return; memset(new_buckets, 0, new_bucket_count * sizeof(hash_entry_t*)); #ifdef HTABLE_KEEP_STATS table->used_buckets = 0; #endif hash_entry_t* entry; #ifdef HTABLE_MAINTAIN_LIST entry = table->first; while (entry) { int r = _htable_bucket_insert(new_buckets, entry, new_hash_mask); #ifdef HTABLE_KEEP_STATS table->used_buckets += r; #endif entry = entry->next; } #else hash_entry_t* next; for (uint32_t i=0; i < table->bucket_count; i++) { entry = table->buckets[i]; while (entry) { next = entry->bucket_next; int r = _htable_bucket_insert(new_buckets, entry, new_hash_mask); #ifdef HTABLE_KEEP_STATS table->used_buckets += r; #endif entry = next; } } #endif free(table->buckets); table->buckets = new_buckets; table->bucket_count = new_bucket_count; table->hash_mask = new_hash_mask; } // Get the pointer to the value of the entry or NULL if not in table static YOUR_VALUE_T* htable_value(const hash_table_t* table, const char* key, size_t key_len) { // un-const table so that find_entry can keep statistics hash_entry_t* entry = htable_find_entry((hash_table_t*)table, key, key_len, NULL, NULL); return (entry != NULL) ? &entry->value : NULL; } static YOUR_VALUE_T htable_get(const hash_table_t* table, const char* key, size_t key_len, const YOUR_VALUE_T default_value) { // un-const table so that find_entry can keep statistics hash_entry_t* entry = htable_find_entry((hash_table_t*)table, key, key_len, NULL, NULL); return (entry != NULL) ? entry->value : default_value; } static int htable_exists(const hash_table_t* table, const char* key, size_t key_len) { // un-const table so that find_entry can keep statistics return (htable_find_entry((hash_table_t*)table, key, key_len, NULL, NULL) != NULL); } // Add a new entry (but don't update if it already exists) // Returns NULL if the entry already exists (use set() if you want add or update logic) static hash_entry_t* htable_add(hash_table_t* table, const char* key, size_t key_len, YOUR_VALUE_T value) { uint32_t hash; hash_entry_t* res = htable_find_entry(table, key, key_len, &hash, &key_len); if (res != NULL) return NULL; if ((res = (hash_entry_t*)malloc(sizeof(hash_entry_t))) == NULL) return NULL; if ((res->key = (char*)malloc(key_len + 1)) == NULL) { free(res); return NULL; } memcpy(res->key, key, key_len + 1); res->key_len = key_len; res->hash = hash; res->value = value; #ifdef HTABLE_MAINTAIN_LIST res->prev = NULL; res->next = NULL; if (table->first == NULL) { table->first = res; table->last = res; } else { res->prev = table->last; table->last->next = res; table->last = res; } #endif int r = _htable_bucket_insert(table->buckets, res, table->hash_mask); #ifdef HTABLE_KEEP_STATS table->used_buckets += r; #endif table->entry_count++; _htable_adjust_as_needed(table); return res; } static hash_entry_t* htable_set(hash_table_t* table, const char* key, size_t key_len, YOUR_VALUE_T value) { uint32_t hash; hash_entry_t* res = htable_find_entry(table, key, key_len, &hash, &key_len); if (res != NULL) { res->value = value; return res; } if ((res = (hash_entry_t*)malloc(sizeof(hash_entry_t))) == NULL) return NULL; if ((res->key = (char*)malloc(key_len + 1)) == NULL) { free(res); return NULL; } memcpy(res->key, key, key_len + 1); res->key_len = key_len; res->hash = hash; res->value = value; #ifdef HTABLE_MAINTAIN_LIST res->prev = NULL; res->next = NULL; if (table->first == NULL) { table->first = res; table->last = res; } else { res->prev = table->last; table->last->next = res; table->last = res; } #endif int r = _htable_bucket_insert(table->buckets, res, table->hash_mask); #ifdef HTABLE_KEEP_STATS table->used_buckets += r; #endif table->entry_count++; _htable_adjust_as_needed(table); return res; } // Update an entry but don't add aa new entry it doesn't already exist. Returns NULL if doesn't exist static hash_entry_t* htable_update(hash_table_t* table, const char* key, size_t key_len, YOUR_VALUE_T value) { hash_entry_t* res = htable_find_entry(table, key, key_len, NULL, NULL); if (res == NULL) return NULL; res->value = value; return res; } // Delete an entry. Returns 1 on success or 0 if the entry didn't exist static int htable_delete(hash_table_t* table, const char* key, size_t key_len) { uint32_t hash; hash_entry_t* entry = htable_find_entry(table, key, key_len, &hash, &key_len); if (entry == NULL) return 0; #ifdef HTABLE_MAINTAIN_LIST if (entry == table->first) table->first = entry->next; if (entry == table->last) { table->last = entry->prev; } if (entry->prev != NULL) entry->prev->next = entry->next; if (entry->next != NULL) entry->next->prev = entry->prev; #endif uint32_t bucket = hash & table->hash_mask; hash_entry_t* bhead = table->buckets[bucket]; hash_entry_t* bprev = NULL; #ifdef HTABLE_FAST_DELETES bprev = entry->bucket_prev; #else if (bhead != entry) { bprev = bhead; while (bprev->bucket_next != entry) bprev = bprev->bucket_next; } #endif if (bprev != NULL) bprev->bucket_next = entry->bucket_next; #ifdef HTABLE_FAST_DELETES if (entry->bucket_next != NULL) entry->bucket_next->bucket_prev = entry->bucket_next; #endif if (bhead == entry) { table->buckets[bucket] = entry->bucket_next; #ifdef HTABLE_KEEP_STATS if (entry->bucket_next == NULL) table->used_buckets--; #endif } free(entry->key); free(entry); table->entry_count--; #ifndef HTABLE_FAST_DELETES htable_pack(table); #endif return 1; } static void htable_pack(hash_table_t* table) { _htable_adjust_as_needed(table); } 

用法示例(作为断言)和效率测试。 使用int作为数据值类型…

 hash_table_t* ht = htable_create(0, 0, 0); assert(ht != NULL); // Table was created successfully // Testing basic adding/updating/getting... assert(htable_add(ht, "hello-world", 0, 234) != NULL); // hello-world set to 234 assert(htable_add(ht, "goodbye-world", 0, 567) != NULL); // goobye-world set to 567 assert(ht->entry_count == 2); // Two entries exist (hello-world and goodbye-world) assert(htable_exists(ht, "hello-world", 0) == 1); // hello-world exists assert(htable_exists(ht, "goodbye-world", 0) == 1); // goodbye-world exists assert(htable_exists(ht, "unknown-world", 0) == 0); // unknown-world doesn't exist assert(*htable_value(ht, "hello-world", 0) == 234); // hello-world has a value of 234 assert(*htable_value(ht, "goodbye-world", 0) == 567); // goodbye-world has a value of 567 assert(htable_get(ht, "hello-world", 0, -1) == 234); // hello-world exists and has a value of 234 assert(htable_get(ht, "goodbye-world", 0, -1) == 567); // goobye-world exists and has a value of 567 assert(htable_get(ht, "unknown-world", 0, -1) == -1); // unknown-world does not exist so the default value of -1 is returned *htable_value(ht, "hello-world", 0) = -1; // hello-world's value is directly set via reference to -1 *htable_value(ht, "goodbye-world", 0) = -2; // goodbye-world's value is directly set via reference to -2 assert(*htable_value(ht, "hello-world", 0) == -1); // hello-world has a value of -1 assert(*htable_value(ht, "goodbye-world", 0) == -2); // goodbye-world has a value of -2 assert(htable_update(ht, "hello-world", 0, 1000) != NULL); // hello-world set to 1000 assert(htable_update(ht, "goodbye-world", 0, 2000) != NULL); // goodbye-world set to 2000 assert(htable_update(ht, "unknown-world", 0, 3000) == NULL); // unknown-world not set (it doesn't exist); assert(ht->entry_count == 2); // Two entries exist (hello-world and goodbye-world) assert(htable_set(ht, "hello-world", 0, 1111) != NULL); // hello-world set to 1111 assert(htable_set(ht, "goodbye-world", 0, 2222) != NULL); // goodbye-world set to 2222 assert(htable_set(ht, "unknown-world", 0, 3333) != NULL); // unknown-world added with a value of 3333 assert(ht->entry_count == 3); // Three entries exist (hello-world, goodbye-world, and unknown-world) printf("%s\n", "After all additions and changes:"); #ifdef HTABLE_MAINTAIN_LIST // A foreach iteration hash_entry_t* entry = ht->first; while (entry != NULL) { printf("\"%s\" = %i\n", entry->key, entry->value); entry = entry->next; } #endif #ifdef HTABLE_KEEP_STATS assert(ht->entry_count - ht->used_buckets == 0); // Means that no hash collisions occured assert(ht->misses == 0); // Means that each lookup was in O(1) time #endif // Testing basic deletion... assert(htable_delete(ht, "not-a-world", 0) == 0); // not-a-world not deleted (doesn't exist) assert(htable_delete(ht, "hello-world", 0) == 1); // hello-world deleted assert(htable_delete(ht, "hello-world", 0) == 0); // hello-world not deleted (doesn't exist) assert(htable_exists(ht, "hello-world", 0) == 0); // hello-world doesn't exit assert(htable_exists(ht, "goodbye-world", 0) == 1); // goobye-world still exists assert(htable_exists(ht, "unknown-world", 0) == 1); // unknown-world still exists assert(ht->entry_count == 2); // Two entries exists (goodbye-world and unknown-world) assert(htable_delete(ht, "unknown-world", 0) == 1); // unknown-world deleted assert(htable_exists(ht, "unknown-world", 0) == 0); // unknown-world doesn't exit assert(htable_exists(ht, "goodbye-world", 0) == 1); // goodbye-world still exists assert(ht->entry_count == 1); // One entry exists (goodbye-world) #ifdef HTABLE_MAINTAIN_LIST // A foreach iteration printf("%s\n", "After deletion:"); entry = ht->first; while (entry != NULL) { printf("\"%s\" = %i\n", entry->key, entry->value); entry = entry->next; } #endif #ifdef HTABLE_KEEP_STATS assert(ht->entry_count - ht->used_buckets == 0); // Means that no hash collisions occured assert(ht->misses == 0); // Means that each lookup was in O(1) time #endif htable_free(ht); 

另外,我使用100,000个随机生成的ASCII密钥进行了一些测试,长度在5到1000个字符之间,显示以下内容……

  • 使用默认参数随机输入后:
    • 参赛作品:100000
    • 铲斗:131072
    • 二手水桶:69790
    • 碰撞:30210
    • 遗漏:71394
    • 哈希/铲斗效率: 69.79%
  • 使用1/2增长率随机输入后:
    • 参赛作品:100000
    • 铲斗:262144
    • 二手水桶:83181
    • 碰撞:16819
    • 遗漏:35436
    • 哈希/铲斗效率: 83.18%
  • 使用2/1增长率随机输入后:
    • 参赛作品:100000
    • 铲斗:65536
    • 二手水桶:51368
    • 碰撞:48632
    • 遗漏:141607
    • 哈希/铲斗效率: 51.37%

正如您所看到的,它有可能表现得非常好。 效率为80%意味着大约80%的查找是O(1),大约16%的查找是O(2),大约3.2%的查找是O(3),大约0.8%的查找是O(4+)。 这意味着平均查找需要O(1.248)

同样,50%的效率意味着50%的查找是O(1),25%是O(2),12.5%是O(3),12.5%是O(4+)

您只需要针对已知因素选择(或编写)正确的哈希算法,并根据您的特定需求进行调整。

笔记:

  1. 那些断言/测试对我有用,但不能保证没有bug 。 它看起来确实相当稳定。 那里可能还有一两个漏洞。
  2. 如果您需要列表管理,您可以通过管理entry->preventry->next轻松添加诸如move()swap()sort()insert()等内容。
  3. 我无法包含测试代码,因为我似乎达到了答案大小限制。
  4. 散列函数或最终字符串比较都不包含在时间分析中。 如果不了解有关输入的所有统计数据,就无法进行分析。 这两个函数应该非常快,如果有关输入数据的更多信息,则可以完全排除字符串比较。

if you know the set of all possible variable names, then it would be possible to use to perfect hash the names to numbers

but each of the hash tables would end up having the same length an example is if X and y are the names then the map would always be of length 2

if perfect(str) turns 'x' and 'y' into 0 and 1; then the function get would be

 get(field, table) { return table[perfect(field)]; }