在C中删除数组中的重复项

这个问题有点复杂。这里的问题是摆脱重复，并将数组的唯一元素保存到具有原始序列的另一个数组中。

例如：

如果输入是bacadt

结果应该是：输入输入的确切状态的bacdt。

因此，对于排序数组，然后检查无法工作，因为我丢失了原始序列。我被建议使用索引数组，但我不知道该怎么做。那你有什么建议呢？

对于那些愿意回答这个问题的人，我想补充一些具体的信息。

char** finduni(char *words[100],int limit) { // //Methods here // }

是我的function。应该删除重复项并将其存储在不同数组中的数组是单词[100]。因此，这个过程将在此完成。我首先考虑将单词的所有元素放入另一个数组并对该数组进行排序，但在某些测试之后这不起作用。只是提醒解决者:)。

好吧，这是一个char类型的版本。请注意，它不会缩放。

 #include "stdio.h" #include "string.h" void removeDuplicates(unsigned char *string) { unsigned char allCharacters [256] = { 0 }; int lookAt; int writeTo = 0; for(lookAt = 0; lookAt < strlen(string); lookAt++) { if(allCharacters[ string[lookAt] ] == 0) { allCharacters[ string[lookAt] ] = 1; // mark it seen string[writeTo++] = string[lookAt]; // copy it } } string[writeTo] = '\0'; } int main() { char word[] = "abbbcdefbbbghasdddaiouasdf"; removeDuplicates(word); printf("Word is now [%s]\n", word); return 0; }

以下是输出：

 Word is now [abcdefghsiou]

这是你想要的吗？如果字母之间有空格，则可以修改方法，但如果使用int ， float ， double或char *作为类型，则此方法根本不会缩放。

编辑

我发布了，然后看到了你的澄清，它是一个char *数组。我会更新方法。

我希望这不是太多的代码。我改编了这个QuickSort算法，基本上为它添加了索引记忆。该算法为O（n log n），因为下面的3个步骤是加性的，并且这是其中2个的最坏情况复杂度。

对字符串数组进行排序，但每个交换也应该反映在索引数组中。在此阶段之后， originalIndices的第i个元素保存排序数组的第i个元素的原始索引。
通过将排序数组中的重复元素设置为NULL并将索引值设置为elements来删除它们，这是最高的elements 。
对原始索引数组进行排序，并确保每个交换都反映在字符串数组中。这给了我们原始的字符串数组，除了重复项在最后并且它们都是NULL 。
为了更好的衡量，我返回新的元素数。

码：

 #include "stdio.h" #include "string.h" #include "stdlib.h" void sortArrayAndSetCriteria(char **arr, int elements, int *originalIndices) { #define MAX_LEVELS 1000 char *piv; int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R; int idx, cidx; for(idx = 0; idx < elements; idx++) originalIndices[idx] = idx; beg[0] = 0; end[0] = elements; while (i>=0) { L = beg[i]; R = end[i] - 1; if (L= 0 && L < R) R--; if (L < R) { arr[L] = arr[R]; originalIndices[L++] = originalIndices[R]; } while (strcmp(arr[L], piv) <= 0 && L < R) L++; if (L < R) { arr[R] = arr[L]; originalIndices[R--] = originalIndices[L]; } } arr[L] = piv; originalIndices[L] = cidx; beg[i + 1] = L + 1; end[i + 1] = end[i]; end[i++] = L; } else { i--; } } } int removeDuplicatesFromBoth(char **arr, int elements, int *originalIndices) { // now remove duplicates int i = 1, newLimit = 1; char *curr = arr[0]; while (i < elements) { if(strcmp(curr, arr[i]) == 0) { arr[i] = NULL; // free this if it was malloc'd originalIndices[i] = elements; // place it at the end } else { curr = arr[i]; newLimit++; } i++; } return newLimit; } void sortArrayBasedOnCriteria(char **arr, int elements, int *originalIndices) { #define MAX_LEVELS 1000 int piv; int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R; int idx; char *cidx; beg[0] = 0; end[0] = elements; while (i>=0) { L = beg[i]; R = end[i] - 1; if (L= piv && L < R) R--; if (L < R) { arr[L] = arr[R]; originalIndices[L++] = originalIndices[R]; } while (originalIndices[L] <= piv && L < R) L++; if (L < R) { arr[R] = arr[L]; originalIndices[R--] = originalIndices[L]; } } arr[L] = cidx; originalIndices[L] = piv; beg[i + 1] = L + 1; end[i + 1] = end[i]; end[i++] = L; } else { i--; } } } int removeDuplicateStrings(char *words[], int limit) { int *indices = (int *)malloc(limit * sizeof(int)); int newLimit; sortArrayAndSetCriteria(words, limit, indices); newLimit = removeDuplicatesFromBoth(words, limit, indices); sortArrayBasedOnCriteria(words, limit, indices); free(indices); return newLimit; } int main() { char *words[] = { "abc", "def", "bad", "hello", "captain", "def", "abc", "goodbye" }; int newLimit = removeDuplicateStrings(words, 8); int i = 0; for(i = 0; i < newLimit; i++) printf(" Word @ %d = %s\n", i, words[i]); return 0; }

遍历数组中的项 – O(n)运算
对于每个项目，将其添加到另一个排序数组
在将其添加到已排序的数组之前，请检查该条目是否已存在 – O(log n)操作

最后， O(n log n)操作

我认为在C中你可以创建第二个数组。然后只有当此元素不在发送数组中时，才从原始数组中复制元素。这也保留了元素的顺序。

如果您逐个读取元素，则可以在插入原始数组之前丢弃该元素，这可以加快该过程。

正如Thomas在评论中所建议的那样，如果数组的每个元素都保证来自一组有限的值（例如char ），那么你可以在O(n)时间内实现这一点。

保持256个bool的数组（如果你的编译器不支持bool则为int ）或者数组中可能有许多不同的离散值。将所有值初始化为false 。
逐个扫描输入数组。
对于每个元素，如果bool数组中的对应值为false ，则将其添加到输出数组并将bool数组值设置为true 。否则，什么也不做。

你知道如何为char类型做，对吧？您可以使用字符串执行相同的操作，但不是使用bool数组（这在技术上是“set”对象的实现），您必须使用线性字符串数组来模拟“set”（或bool数组）你已经遇到过。即你已经看到了一个字符串数组，对于每个新字符串，你检查它是否在“看到”字符串数组中，如果是，那么你忽略它（不是唯一的），如果它不在数组中，你添加它到两个看到的字符串和输出数组。如果您有少量不同的字符串（低于1000），您可以忽略性能优化，并简单地将每个新字符串与您之前看到的所有字符串进行比较。

但是，对于大量字符串（数千个），您需要稍微优化一下：

1）每次向已经看到的字符串数组添加新字符串时，请使用插入排序算法对数组进行排序。不要使用quickSort，因为当数据几乎排序时，插入排序往往会更快。

2）检查字符串是否在数组中时，使用二进制搜索。

如果不同字符串的数量是合理的（即您没有数十亿个唯一字符串），这种方法应该足够快。

在C中删除数组中的重复项

初始化指针指针数组

NSSearchPathForDirectoriesInDomains解释混乱

如何在android项目中使用NDK？

在BLAS库中调用DDOT函数

LDR – 文字池 – ARM

是否需要在头文件中声明所有C函数

sqrt（）函数链接错误

找出特定整数有多少二进制数字

一个纯字节版本的strstr？

为什么我不需要在打印之前取消引用C中的字符指针？