计算文本文件中单词的重复次数

扩展我之前的练习，我有一个文本文件，每行填充一个单词。

hello hi hello bonjour bonjour hello

当我从文件中读取这些单词时，我想将它们与一个struct指针数组（从文本文件创建）进行比较。如果数组中不存在该单词，则应将该单词存储到计数为1的结构指针中。如果该单词已存在于数组中，则计数应增加1.我将结果写入新文件（那已经存在了）。

 hello = 3 hi = 1 bonjour = 2

这是我的代码

 #include  #include  struct wordfreq{ int count; char *word; }; int main(int argc, char * argv[]) { struct wordfreq *words[1000] = {NULL}; int i, j, f = 0; for(i=0; i <1000; i++) words[i] = (struct wordfreq*)malloc(sizeof(struct wordfreq)); FILE *input = fopen(argv[1], "r"); FILE *output = fopen(argv[2], "w"); if(input == NULL){ printf("Error! Can't open file.\n"); exit(0); } char str[20]; i=0; while(fscanf(input, "%s[^\n]", &str) ==1){ //fprintf(output, "%s:\n", str); for(j=0; j word); if(str == words[j]->word){ words[j] ->count ++; f = 1; } } if(f==0){ words[i]->word = str; words[i]->count = 1; } //fprintf(output, "\t%s = %d\n", words[i]->word, words[i]->count); i++; } for(j=0; jword, words[j]->count); for(i=0; i<1000; i++){ free(words[i]); } return 0; }

我使用了几个fprintf语句来查看我的值，我可以看到str是正确的，当我到达行来比较str与其他数组结构指针(str == words[I]->word)在横向words[0] -> word期间words[0] -> word总是与str相同，其余的words[i]->words是（null）。我仍然试图完全理解混合指针和结构，并说任何想法，评论，抱怨？

您可能会使事情变得比必要的更难，并且在输入文件的情况下，您肯定会分配997个不必要的结构。没有必要预先分配所有1000结构。（你可以自由地这样做，这只是一个内存管理问题）。关键是每次遇到一个唯一的单词时，你只需要分配一个新的结构。（对于您的数据文件，为3次）。对于所有其他情况，您只需更新count以添加已存储的单词的出现次数。

另外，如果没有令人信服的理由使用struct ，那么使用指向char的指针数组就像指向每个word的指针一样容易，然后使用一个简单的int [1000]数组作为count （或者频率）arrays。你的选择。在两个数组的情况下，您只需要为每个唯一的word分配，并且永远不需要为每个struct分配单独的分配。

将这些部分组合在一起，您可以减少代码（不包括文件 – 可以通过简单的重定向处理）到以下内容：

#include #include #include enum { MAXC = 128, MAXW = 1000 }; struct wordfreq{ int count; char *word; }; int main (void) { struct wordfreq *words[MAXW] = {0}; char tmp[MAXC] = ""; int n = 0; /* while < MAXW unique words, read each word in file */ while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) { int i; for (i = 0; i < n; i++) /* check against exising words */ if (strcmp (words[i]->word, tmp) == 0) /* if exists, break */ break; if (i < n) { /* if exists */ words[i]->count++; /* update frequency */ continue; /* get next word */ } /* new word found, allocate struct and * allocate storage for word (+ space for nul-byte) */ words[n] = malloc (sizeof *words[n]); words[n]->word = malloc (strlen (tmp) + 1); if (!words[n] || !words[n]->word) { /* validate ALL allocations */ fprintf (stderr, "error: memory exhausted, words[%d].\n", n); break; } words[n]->count = 0; /* initialize count */ strcpy (words[n]->word, tmp); /* copy new word to words[n] */ words[n]->count++; /* update frequency to 1 */ n++; /* increment word count */ } for (int i = 0; i < n; i++) { /* for each word */ printf ("%s = %d\n", words[i]->word, words[i]->count); free (words[i]->word); /* free memory when no longer needed */ free (words[i]); } return 0; }

示例输入文件

$ cat dat/wordfile.txt hello hi hello bonjour bonjour hello

示例使用/输出

$ ./bin/filewordfreq
与动态分配内存的任何代码一样，您将需要validation对内存的使用，以确保您没有超出边界或基于条件移动或跳转未初始化值。在Linux中， valgrind是自然的选择（每个操作系统都有类似的程序）。只需通过它运行程序，例如： $ valgrind ./bin/filewordfreqstruct validation您free所有内存以及没有内存错误。仔细看看，如果您有任何其他问题，请告诉我。使用2数组而不是 struct 如上所述，有时使用存储arrays和频率arrays可以简化完成相同的事情。无论何时你需要任何“设置”的频率，你的第一个想法应该是频率数组。它只不过是一个与“set”中项目数量相同的数组（开头时初始化为0 ）。同样的方法适用于在存储arrays中添加（或查找现有的）副本时，将频率数组中的相应元素增加1 。完成后，频率数组元素将保持存储arrays中相应元素出现的频率。这相当于上面的程序。 #include #include #include enum { MAXC = 128, MAXW = 1000 }; int main (void) { char *words[MAXW] = {NULL}, /* storage array of pointers to char* */ tmp[MAXC] = ""; int freq[MAXW] = {0}, n = 0; /* simple integer frequency array */ /* while < MAXW unique words, read each word in file */ while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) { int i; for (i = 0; words[i]; i++) /* check against exising words */ if (strcmp (words[i], tmp) == 0) /* if exists, break */ break; if (words[i]) { /* if exists */ freq[i]++; /* update frequency */ continue; /* get next word */ } /* new word found, allocate storage (+ space for nul-byte) */ words[n] = malloc (strlen (tmp) + 1); if (!words[n]) { /* validate ALL allocations */ fprintf (stderr, "error: memory exhausted, words[%d].\n", n); break; } strcpy (words[n], tmp); /* copy new word to words[n] */ freq[n]++; /* update frequency to 1 */ n++; /* increment word count */ } for (int i = 0; i < n; i++) { /* for each word */ printf ("%s = %d\n", words[i], freq[i]); /* output word + freq */ free (words[i]); /* free memory when no longer needed */ } return 0; } 使用这种方法，通过使用静态声明的频率数组来count您的内存分配的1/2。无论哪种方式都很好，这在很大程度上取决于你。
奇怪的C整数不等式比较结果我应该使用printf（“\ n”）还是putchar（’\ n’）在C中打印换行符？在教C时，在指针之前或之后教导数组会更好吗？是什么限制了c中嵌套循环的数量？ Ncurses：如何在不丢失当前位置的情况下刷新菜单？加载外部文件flex bison – yyin？ Valgrind在使用glib数据类型时报告内存“可能已丢失” 在C语言中接受单个字符的菜单什么时候argv 有空？函数返回指向字符串不起作用的指针 C的文件是否具有面向对象的界面？

计算文本文件中单词的重复次数

这会导致seg故障

#include在main（）函数中

如何从C代码调用powershell脚本

这个枚举是什么意思？

将void指针强制转换为char指针指针是否安全

AC程序，检查输入的日期是否有效

为清晰起见，是否应使用返回类型的无用类型限定符？

在OSX中崩溃时自动重启程序

对getchar和scanf感到困惑

Pascal Triangle就像C中的星形图案