Hash Table

I was looking at this script and was wondering if anyone can explain what this script does and how does it work. Thank you for any help.

State* lookup(char* prefix[NPREF], int create)

{

     int i, h;

     State *sp = NULL ;

     h = hash(prefix);

     for (sp = statetab[h]; sp != NULL; sp = sp->next) {

     for (i = 0; i < NPREF; i++)

     if (strcmp(prefix, sp->pref) != 0)

     if (i == NPREF) /* found it */

     return sp;

     break;

}

     if (create) {

          sp = (State*) emalloc( sizeof( State ));

          for( i=0; i<NPREF; i++ )

          sp->pref = prefix;

          sp->suf = NULL;

          sp->next = statetab[h];

          statetab[h] = sp;

     }

     return sp;

}

Indentation helps:

State* lookup(char* prefix[NPREF], int create)
 
{
 
     int i, h;
 
     State *sp = NULL ;
 
     h = hash(prefix);
 
     for (sp = statetab[h]; sp != NULL; sp = sp->next) {
 
       for (i = 0; i < NPREF; i++)
 
         if (strcmp(prefix, sp->pref) != 0)
 
           if (i == NPREF) /* found it */
 
             return sp;
 
       break;
 
     }
 
     if (create) {
 
          sp = (State*) emalloc( sizeof( State ));
 
          for( i=0; i<NPREF; i++ )
 
            sp->pref = prefix;
 
          sp->suf = NULL;
 
          sp->next = statetab[h];
 
          statetab[h] = sp;
 
     }
 
     return sp;
 
}

Hash table (no payload, just keys) insert unique (i.e., if not already). Hash table - Wikipedia, the free encyclopedia Very odd contents match logic! What language?

It's C. Not shell.

[quote=dgpickett;302923149]
Indentation helps:

Very odd contents match logic! What language?

Yes, I don't think this code will ever match anything:

       for (i = 0; i < NPREF; i++)
 
         if (strcmp(prefix, sp->pref) != 0)
 
           if (i == NPREF) /* found it */
 
             return sp;

The variable i should never be == to NPREF within that loop.

Moderator comments were removed during original forum migration.

How does it work? It words badly, if at all.

That is some seriously broken code. Even if the match code is fixed, it will only examine the first State *sp pointer because of the break statement in the outer for loop.

And then, if create is non-zero and the upper loops didn't find a match at the first node, it will stuff the passed-in data onto the hash table even if it already exists further down the linked list.

Yes, although PERL and python can look a lot like C, I figured C.

It does seem to be a find/optional insert if missing, hash map/table routine, with linked list buckets, but I am not familiar with passing an array and its length that way. It seems like if the elements char* can be copied, why not just copy the array char** to a pointer element? Let's try something more simple and legal:

State* lookup( char* prefix[], size_t NPREF, int create ){
     int i, h;
     State *sp = NULL ;
 
     h = hash(prefix);
 
     for ( sp = statetab[h]; sp != NULL; sp = sp->next ){
       for  (i = 0; i < NPREF; i++ )
         if ( strcmp( prefix, sp->pref ) != 0 )
           break ;
 
       if ( i == NPREF ) /* found it */ 
             return sp;
     }
 
     if (create) {
          sp = (State*) emalloc( sizeof( State ));
          sp->pre = prefix ;
          sp->suf = NULL;
          sp->next = statetab[h];
          statetab[h] = sp;
      }
 
     return sp;
 }

It'd be nice if it had the call format like bsearch, tsearch but then it would be hsearch()! http://www.unix.com/man-page/opensolaris/3c/hsearch/

If you remove the explicit array size from the lookup() call and add it as a separate parameter, you have to add that parameter to the hash() call too.

Good point! Been lazing arund OO land so long! :smiley: If NPREF is a macro, maybe hash() uses it, too. Classic C trick is to over-allocate the char*[] by 1 so there is room for a null pointer at the end, like char *argv[], reminiscent of null term string.

Thanks a lot guys. Here is the full code, I was looking at:
And would anyone know how to write a function to give the memory back in statetab? As in clean up statetab when it is done

#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include "eprintf.h"

enum {
        NPREF   = 2,    /* number of prefix words */
        NHASH   = 4093, /* size of state hash table array */
        MAXGEN  = 10000 /* maximum words generated */
};

typedef struct State State;
typedef struct Suffix Suffix;

struct State {  /* prefix + suffix list */
        char*   pref[NPREF];    /* prefix words */
        Suffix* suf;                    /* list of suffixes */
        State*  next;                   /* next in hash table */
};

struct Suffix { /* list of suffixes */
        char *  word;                   /* suffix */
        Suffix* next;                   /* next in list of suffixes */
};

State* lookup(char *prefix[], int create);
void    build(char *prefix[], FILE*);
void    generate(int nwords);
void    add(char *prefix[], char *word);

State*  statetab[NHASH];        /* hash table of states */

char NONWORD[] = "\n";  /* cannot appear as real word */

/* markov main: markov-chain random text generation */
int main(void)
{
        int i, nwords = MAXGEN;
        char *prefix[NPREF];            /* current input prefix */

        int c;
        long seed;

        setProgName("markov");

        seed = time(NULL);
        srand(seed);

        for (i = 0; i < NPREF; i++)     /* set up initial prefix */
                prefix = NONWORD;

        build(prefix, stdin);
        add(prefix, NONWORD);
        generate(nwords);

        return 0;
}   

const int MULTIPLIER = 31;  /* for hash() */

/* hash: compute hash value for array of NPREF strings */
unsigned int hash(char* s[NPREF])
{
        unsigned int h;
        unsigned char *p;
        int i;

        h = 0;
        for (i = 0; i < NPREF; i++)
                for (p = (unsigned char *) s; *p != '\0'; p++)
                        h = MULTIPLIER * h + *p;
        return h % NHASH;
}

/* lookup: search for prefix; create if requested. */
/*  returns pointer if present or created; NULL if not. */
/*  creation doesn't strdup so strings mustn't change later. */
State* lookup(char* prefix[NPREF], int create)
{
        int i, h;
        State *sp = NULL ;

        h = hash(prefix);
        for (sp = statetab[h]; sp != NULL; sp = sp->next) {
                for (i = 0; i < NPREF; i++)
                        if (strcmp(prefix, sp->pref) != 0)
                                break;
                if (i == NPREF)         /* found it */
                        return sp;
        }
        if (create) {
                sp = (State*) emalloc( sizeof( State ));
                for( i=0; i<NPREF; i++ )
                        sp->pref = prefix;
                sp->suf = NULL;
                sp->next = statetab[h];
                statetab[h] = sp;
        }
        return sp;
}

/* addsuffix: add to state. suffix must not change later */
void addsuffix(State *sp, char *suffix)
{
        Suffix *suf;

        suf = (Suffix *) emalloc(sizeof(Suffix));
        suf->word = suffix;
        suf->next = sp->suf;
        sp->suf = suf;
}

/* add: add word to suffix list, update prefix */
void add(char *prefix[NPREF], char *suffix)
{
        State *sp;

        sp = lookup(prefix, 1);  /* create if not found */
        addsuffix(sp, suffix);
        /* move the words down the prefix */
        memmove(prefix, prefix+1, (NPREF-1)*sizeof(prefix[0]));
        prefix[NPREF-1] = suffix;
}

/* build: read input, build prefix table */
void build(char *prefix[NPREF], FILE *f)
{
        char buf[100], fmt[10];

        /* create a format string; %s could overflow buf */
        sprintf(fmt, "%%%lus", sizeof(buf)-1);
        while (fscanf(f, fmt, buf) != EOF)
                add(prefix, estrdup(buf));
}

/* generate: produce output, one word per line */
void generate(int nwords)
{
        State *sp;
        Suffix *suf;
        char *prefix[NPREF], *w;
        int i, nmatch;

        for (i = 0; i < NPREF; i++)     /* reset initial prefix */
                prefix = NONWORD;

        for (i = 0; i < nwords; i++) {
                sp = lookup(prefix, 0);
                if (sp == NULL)
                        eprintf("internal error: lookup failed");
                nmatch = 0;
                for (suf = sp->suf; suf != NULL; suf = suf->next)
                        if (rand() % ++nmatch == 0) /* prob = 1/nmatch */
                                w = suf->word;
                if (nmatch == 0)
                        eprintf("internal error: no suffix %d %s", i, prefix[0]);
                if (strcmp(w, NONWORD) == 0)
                        break;
                printf("%s\n", w);
                memmove(prefix, prefix+1, (NPREF-1)*sizeof(prefix[0]));
                prefix[NPREF-1] = w;
        }
}

Well, exit() is real cheap. Otherwise, what you mallo() you must free(). If you use mmap() in 64 bit mode in lieu of malloc(), you are just using up disk and address space until exit(). Otherwise, try JAVA -- anything not referenced is collected, so if you overwrite the parent's only reference, it is collected, which makes the children unreferenced, and so they are collected.

Here is the cleanup:

void cleanup()
{
    int h;
    State *sp = NULL ;
    Suffix *suf, *t;

    for(h = 0; h < NHASH; h++) {
        for (sp = statetab[h]; sp != NULL; sp = statetab[h]) {
            for(suf = sp->suf; suf != NULL; suf = t) {
                t = suf->next;
                free(suf);
            }
            statetab[h]=sp->next;
            free(sp);
        }
    }
}

EDIT:

The lookup function is still not working properly the comment "found it" seems to imply the prefix was found however the preceding code calls break on the first string the doesn't match.

The whole prefix array list part is fairly strange. What are you trying to achieve with prefix? I feel things would be much easier and consistent if you used another linked list for prefix rather that an array and the kludgy memmove() calls.

If you could give me a description on how this prefix/suffix relationship is supposed to work I'm sure I could suggest a better structure to store this data. Once we get the structure right it will be much easier to get the add, delete and lookup routines done.

I suppose hsearch() does some cleanup for you?

@DGPickett Umm, I cant see any reference to hsearch() in this code, are you sure you're posting in the right thread?

I am suggesting that one should not write what is already written. The code should have been written using hsearch(), unless there is some greater functionality needed it does not support, like perhaps linear hash expansion. The RogueWave H++ in C++ is also nice for hash capability. C can call C++.