URL Validation Help in C

Hi friends,

I have been writing one c code to validate any url passed by a user.
Here is the code I have so far coded.
I am using one dictionary file to check values like http, https, www, ftp, etc for the URL that a user will pass for validation.

My code so far holds good for every thing except for if blank spaces or a new line character or a tab is mentioned in the URL string passed by user. I m getting a segmentation fault.

Please Help me.

#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define SIZE 50
char *domain1 = "com";
char *domain2 = "co.in";
char *domain3 = "org";
char *domain4 = "net";
char *domain5 = "in";
char *domain6 = "org.net";
static bool f_valid = 0;
bool matchURL(char *url, FILE *fp)
{
        int j = 0;
        int k = 0;
        char buff; 
        char *token1;
        char token2 = {0};
        char token3 = {0};
        char *delimeter = ".";

        token1 = strtok(url,delimeter);
 
        while ((fscanf(fp,"%s",buff)) != EOF)
        {
                if(0 == strcmp(buff,token1))
                {
                        f_valid = 1;
                }
                else
                {
                        f_valid = 0;
                }
        }
 
        j = 1 + strlen(token1);
 
        if (1 == f_valid) 
        { 
                while(url[j] != '.')
                {
                        token2[k] = url[j];
                        j++;
                        k++;
                }
        }
 
        if ((0 == strlen(token2)) || (0 == strcmp("www",token2)))
        {
                f_valid = 0;
        }
 
        j = 2 + (strlen(token1)+strlen(token2));
 
        k = 0;
 
        while(url[j] != '\0')
        {
                token3[k] = url[j];
                j++;
                k++;
        } 
 
        if(1 == f_valid)
        {
                if((0 == strcmp(domain1,token3)) || (0 == strcmp(domain2,token3)) || (0 == strcmp(domain3,token3)) || (0 == strcmp(domain4,token3)) || (0 == strcmp(domain5,token3)) || (0 == strcmp(domain6,token3)))
                {
                        f_valid = 1;
                }
                else 
                {
                        f_valid = 0;
                } 
        }
 
        return f_valid;
}

int main ()
{ 
        int i = 0;
        char url;
        FILE *fp;

        if ((fp=fopen("Dictionary", "r")) == NULL)
        {
                printf("Can't Validate\tDictionary File is missing\n");
                return -1;
        }

        printf("Enter the URL to Validate :\n");
        scanf("%s",url);

        for(i; i < strlen(url); i++)
        {
        if ((url == '@') || (url == '!') || (url == '#') || (url == '$') || (url == '%') || (url == '^') || (url == '&') || (url == '*') || (url == '(') || (url == ')') || (url == '\t') || (url == '\b') || (url == '\n'))
        {
                printf("Invaild URL\nSpecial Character Found!\n");
                return -1;
        }
}

        if ( 1 == matchURL(url,fp))
        {
                printf("Valid URL address\n");
        }
        else
                printf ("Invalid URL address\n");
 
        fclose(fp);
        return 0;
 
}

Your complete and total lack of comments makes it difficult to even begin to figure out where your code is going wrong. You haven't posted what's in your dictionary file either, making it very difficult to guess your intent. breaking the string apart on "." won't get you the http:// at the beginning, either. I've corrected your indenting as best I can.

I think your code needs a rewrite. A lot of things which could've been done with loops or function calls you've done by simple brute force.

Your for-loop while reading the file probably doesn't do what you want. It will loop through every line, meaning, you only get the value you checked for the very last line -- every other value is overwritten by the one after.

Whenever you've decided a URL is invalid, you can just return(0) right then and there, instead of checking f_valid every time thereafter.

You do realize that strtok() modifies its input string, yes?

Also, it'd be better to check for allowable characters than disallowed ones:

int url_specialchars(const char *url)
{
        // The compiler will stack "multiple" "strings" "end" "to" "end"
        // into "multiplestringsendtoend", so we don't need one giant line.
        static const char *nospecial="0123456789"
                "abcdefghijklmnopqrstuvwxyz"
                "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                ".";

        while(*url) // Loop until (*url) == 0.  (*url) is about equivalent to url[0].
        {
                // Can we find the character at *url in the string 'nospecial'?
                // If not, it's a special character and we should return 0.
                if(strchr(nospecial, *url) == NULL) return(0);
                url++; // Jump to the next character.  Adding one to a pointer moves it ahead one element.
        }

        return(1); // Return 1 for success.
}