You may be making things a bit harder than necessary, and you are certainly allocating 997
more structures than necessary in the case of your input file. There is no need to allocate all 1000
structs up front. (you are free to do so, it's just a memory management issue). The key is that you only need allocate a new struct each time a unique word is encountered. (in the case of your data file, 3-times). For all other cases, you are simply updating count
to add the occurrence for a word you have already stored.
Also, if there is no compelling reason to use a struct
, it is just as easy to use an array of pointers-to-char as your pointers to each word
, and then a simple array of int [1000]
as your count
(or frequency) array. Your choice. In the case of two arrays, you only need to allocate for each unique word
and never need a separate allocation for each struct
.
Putting those pieces together, you could reduce your code (not including the file -- which can be handled by simple redirection) to the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 128, MAXW = 1000 };
struct wordfreq{
int count;
char *word;
};
int main (void) {
struct wordfreq *words[MAXW] = {0};
char tmp[MAXC] = "";
int n = 0;
/* while < MAXW unique words, read each word in file */
while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) {
int i;
for (i = 0; i < n; i++) /* check against exising words */
if (strcmp (words[i]->word, tmp) == 0) /* if exists, break */
break;
if (i < n) { /* if exists */
words[i]->count++; /* update frequency */
continue; /* get next word */
}
/* new word found, allocate struct and
* allocate storage for word (+ space for nul-byte)
*/
words[n] = malloc (sizeof *words[n]);
words[n]->word = malloc (strlen (tmp) + 1);
if (!words[n] || !words[n]->word) { /* validate ALL allocations */
fprintf (stderr, "error: memory exhausted, words[%d].
", n);
break;
}
words[n]->count = 0; /* initialize count */
strcpy (words[n]->word, tmp); /* copy new word to words[n] */
words[n]->count++; /* update frequency to 1 */
n++; /* increment word count */
}
for (int i = 0; i < n; i++) { /* for each word */
printf ("%s = %d
", words[i]->word, words[i]->count);
free (words[i]->word); /* free memory when no longer needed */
free (words[i]);
}
return 0;
}
Example Input File
$ cat dat/wordfile.txt
hello
hi
hello
bonjour
bonjour
hello
Example Use/Output
$ ./bin/filewordfreq <dat/wordfile.txt
hello = 3
hi = 1
bonjour = 2
As with any code that dynamically allocates memory, you will want to validate your use of the memory to insure you have not written beyond the bounds or based a conditional move or jump on an uninitialized value. In Linux, valgrind
is the natural choice (there are similar programs for each OS). Just run you program through it, e.g.:
$ valgrind ./bin/filewordfreqstruct <dat/wordfile.txt
==2000== Memcheck, a memory error detector
==2000== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2000== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2000== Command: ./bin/filewordfreqstruct
==2000==
hello = 3
hi = 1
bonjour = 2
==2000==
==2000== HEAP SUMMARY:
==2000== in use at exit: 0 bytes in 0 blocks
==2000== total heap usage: 6 allocs, 6 frees, 65 bytes allocated
==2000==
==2000== All heap blocks were freed -- no leaks are possible
==2000==
==2000== For counts of detected and suppressed errors, rerun with: -v
==2000== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Verify that you free
all memory you allocate and that there are no memory errors.
Look things over and let me know if you have any further questions.
Using 2-Arrays Instead of a struct
As mentioned above, sometimes using a storage array and a frequency array can simplify accomplishing the same thing. Whenever you are faced with needing the frequency of any "set", your first thought should be a frequency array. It is nothing more than an array of the same size as the number of items in your "set", (initialized to 0
at the beginning). The same approach applies, when you add (or find a duplicate of an existing) element in your storage array, you increment the corresponding element in your frequency array by 1
. When you are done, your frequency array elements hold the frequency the corresponding elements in your storage array appear.
Here is an equivalent to the program above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXC = 128, MAXW = 1000 };
int main (void) {
char *words[MAXW] = {NULL}, /* storage array of pointers to char* */
tmp[MAXC] = "";
int freq[MAXW] = {0}, n = 0; /* simple integer frequency array */
/* while < MAXW unique words, read each word in file */
while (n < MAXW && fscanf (stdin, " %s", tmp) == 1) {
int i;
for (i = 0; words[i]; i++) /* check against exising words */
if (strcmp (words[i], tmp) == 0) /* if exists, break */
break;
if (words[i]) { /* if exists */
freq[i]++; /* update frequency */
continue; /* get next word */
}
/* new word found, allocate storage (+ space for nul-byte) */
words[n] = malloc (strlen (tmp) + 1);
if (!words[n]) { /* validate ALL allocations */
fprintf (stderr, "error: memory exhausted, words[%d].
", n);
break;
}
strcpy (words[n], tmp); /* copy new word to words[n] */
freq[n]++; /* update frequency to 1 */
n++; /* increment word count */
}
for (int i = 0; i < n; i++) { /* for each word */
printf ("%s = %d
", words[i], freq[i]); /* output word + freq */
free (words[i]); /* free memory when no longer needed */
}
return 0;
}
Using this approach, you eliminate 1/2 of your memory allocations by using a statically declared frequency array for your count
. Either way is fine, it is largely up to you.