将文本文件拆分为C中的单词

我有两种类型的文本,我想将它们分成单词。

第一种类型的文本文件只是由换行符分隔的单词。

Milk Work Chair ... 

第二种类型的文本文件是书中的文本,它只有空格。 (没有昏迷,问号等)

 And then she tried to run but she was stunned by the view of ... 

你知道哪种方法最好吗?

我尝试了以下两种方式,但似乎我正在进行分割。

对于我使用的第一种文本:

 while(fgets(line,sizeof(line),wordlist) != NULL) { /* Checks Words | printf("%s",line);*/ InsertWord(W,line);/*Function that inserts the word to a tree*/ } 

对于我使用的第二种文本:

 while(fgets(line,sizeof(line),out) != NULL) { bp = line ; while(1) { cp = strtok(bp," "); bp = NULL ; if(cp == NULL) break; /*printf("Word by Word : %s \n",cp);*/ CheckWord(Words, cp);/*Function that checks if the word from the book is the same with one in a tree */ } } 

如果这些错了,你能建议更好或纠正我吗?

编辑:(关于segm.fault)

InsertWord是一个将单词插入树中的函数。 当我使用这段代码时:

 for (i = 0 ; i <=2 ; i++) { if (i==0) InsertWord(W,"A"); if (i==1) InsertWord(W,"B"); if (i==2) InsertWord(W,"c"); }*/ 

树插入单词并打印它们,这意味着我的树工作正常,它的function(它们也由我们的老师给出)。 但是,当我尝试这样做时:

 char this_word[15]; while (fscanf(wordlist, "%14s", this_word) == 1) { printf("Latest word that was read: '%s'\n", this_word); InsertWord(W,this_word); } 

我从树上得到错误。所以,我猜这是某种分段。 有任何想法吗 ?

你想从文件中读取,可能会想到fgets() 。

你想通过分隔符(空格)分成标记,应该记住strtok() 。


所以,你可以这样做:

 #include  #include  int main(void) { FILE * pFile; char mystring [100]; char* pch; pFile = fopen ("text_newlines.txt" , "r"); if (pFile == NULL) perror ("Error opening file"); else { while ( fgets (mystring , 100 , pFile) != NULL ) printf ("%s", mystring); fclose (pFile); } pFile = fopen ("text_wspaces.txt" , "r"); if (pFile == NULL) perror ("Error opening file"); else { while ( fgets (mystring , 100 , pFile) != NULL ) { printf ("%s", mystring); pch = strtok (mystring," "); while (pch != NULL) { printf ("%s\n",pch); pch = strtok (NULL, " "); } } fclose (pFile); } return 0; } 

输出:

 linux25:/home/users/grad1459>./a.out Milk Work Chair And then she tried to run And then she tried to run but she was stunned by the view of but she was stunned by the view of //newline here as well 

这是输入fscanf%s的类型是为:

 char this_word[15]; while (fscanf(tsin, "%14s", this_word) == 1) { printf("Latest word that was read: '%s'.\n", this_word); // Process the word... } 

最简单的方法可能是逐个字符:

 char word[50]; char *word_pos = word; // Discard characters until the first word character while ((ch = fgetch(out)) != EOF && ch != '\n' && ch != ' '); do { if (ch == '\n' || ch == ' ') { *word_pos++ = '\0'; word_pos = word; CheckWord(Words, word); while ((ch = fgetch(out)) != EOF && ch != '\n' && ch != ' '); } *word_pos++ = ch; } while ((ch = fgetch(out)) != EOF); 

你受到word大小的限制,你需要在whileif条件中添加每个停止字符。