Pattern Matching

Aveltium · April 12, 2009, 6:34am

Hi, I'm very new to Linux and I'm sorry if this question is too dumb.
If it is ok, please link me to some beginner guides for questions like this one.

I want to check if the entered string is a number and has 4 digits.

I heard that I should use "regular expression" or something like that to accomplish this task, but I don't know how to use it.

echo -n "Input: ";read x
if [ $x == ${...err i dont know what to put here} ];then echo "Valid"
else echo "Invalid";fi

Any help is much appreciated!

glen.barber · April 12, 2009, 7:21am

The UNIX and Linux Forums - Forum Rules

Read rule number 6.

Aveltium · April 12, 2009, 7:25am

Oh. I'm sorry didn't know this forum is not for beginners.

glen.barber · April 12, 2009, 7:26am

This forum is for beginners. This seems like a classic "check this string to see if it is valid" homework assignment.

glen.barber · April 12, 2009, 7:29am

shell scripting beginner - Google Search

The UNIX and Linux Forums - Search Results

Aveltium · April 12, 2009, 7:39am

Thank you. I was searching a while actually. But seems like no one talks about this. Is it too simple or something?

And why is that "homework" thing is such big deal anyways? If I say it's not a homework and I want to ask just simply because I don't know how to do it and I need some help, then it's not a homework. But I'm not gonna lie, it's my homework, a part of my project.

glen.barber · April 12, 2009, 7:50am

No one talks about this?

Either way, if we help you with your homework, and you get a good degree and a well paying job, are you going to pay us for our work? Or are you going to keep coming here for answers for your job? What would your professor think about this situation? Your boss?

edit by bakunin:

First off, many thanks for doing my job. If you want to become a moderator and boss around forum users (just the same way i do right now with you) just ask Neo. Until you are officially part of the evil side of things (i.e. forum moderation team) please tone it down a bit and at least try to be friendly.

Second: we all have started somewhere sometimes and just because a beginner is a beginner it does not mean his problem is homework. Which - if not uninformed - questions is a beginner supposed to ask? The thread owner did not ask for a solution he asked where he can learn what he need to know to solve it himself. The thread owner has at least tried to do his work himself and wrote a script as far as he got with it.

Third: You try to act as a moderator here but neglect the most basic rules yourself: you post - repeatedly - completely off-topic just to chastise someone. This might not be a beginners forum and it is definitely no homework-forum, but neither is it a holier-than-you-forum and it is no show-off-your-big-ego-forum.

This is a forum to help one another and i would ask you to honor this spirit of the board.

bakunin · April 12, 2009, 8:06am

There are no dumb questions. Welcome on board.

We have a special "Tips and Tutorials" board here and if you use the search feature on "book recommendation" you will find a lot of threads.

My personal favourite regarding regular expressions is "sed & awk" by Dale Dougherty published by O'Reilly. It is well written with a good dose of humor and it covers everything there is to know about these two regex-based programs. There is a specialized book about regular expressions too from the same publisher. It is well written but i didn't like it as much as the aforementioned book.

Ok, having said this, here is a

Short (very short!) Introduction to Regular Expressions

As soon as you deal with text documents invariably you need to search for some content sooner or later. It is easy to search for strings, but in most cases (fixed) strings match not everything they are supposed to match or match things they are not supposed to match. Regexps are not searching for strings but searching for patterns and the regexp language is about describing these patterns.

Suppose you have a long text and want to find the word "colour".

(We will use a small Unix program called "grep" for the examples. It is given an expression to search for and a file in which it carries out the search. It will return all the lines containing the expression. The calling convention is "grep <expr> <file>".)

Ok, here is your first regular expression:

grep "colour" /path/to/file

That wasn't too hard, was it? Well, yes, but it isn't too useful either. We are just searching for a fixed string. Anyway, a fixed string is the simpliest, most basic form of a regular expression.

Now suppose that the text was written by several people, some speak english and some are american *) and therefore "colour" is sometimes written "colour" and sometimes "color". Of course we would like to find both versions ad we have to tell the program somehow that the "u" we are looking for is optional. We want to find "color" as well as "colour" but we wouldn't want to find words like "colonel-major", where something else then a "u" is between the "colo-" and the "-r". Here we go:

grep "colou*r" /path/to/file

The asterisk ("*") tells the regexp-program that the character preceeding it is optional.

We call this a "metacharacter". Most characters only match themselves: an "a" will match an "a" and nothing else (not even the "A", because regexps are case-sensitive). But some special characters do not match anything directly but change the way other characters are matched. A regular expression is usually a mixture of characters and metacharacters.

Looking at the output of the last command we see that it did match also the word "colourful" or "water-color". We might want to match only "colour" (however it is written) but not any conglomerate words.

We do this by matching only whitespace (blanks and tabs) before and after the word but exclude any other character. We use "character set" for this. It says "one of the following" characters (note that i use "<b>" for a blank and "<tab>" for a tab here because they are non-printing characters. Enter literal spaces and tabs instead when you type that in):

grep "[<b><tab>]colou*r[<b><tab>]" /path/to/file

Any ONE character inside "[...]" is matched, but not several! Therefore "d[ae]n" will match "dan" and "den" but not "dean".

-*-

Ok, so far. My time is limited today and i can't explain something in a few words others write books about. I hope you got an impression about how regular expressions work and upon request i might expand this text a little.

____________________
*) sorry - i just can't resist these opportunities ;-))

The regex - without further explanation, but parts of it you will recognize - is:

"^[1]\{4\}$"

"^" used this way is the begin of a line, so the expression will only be found if it starts at the beginning"

"$" analogously end of line - we make sure the string contains only 4 digits

"[0-9]" is short for "[0123456789]", it is possible to use ranges instead of single characters to form sets

"\{n\}" match the previous expression (the brackets) exactly n times

Here is the whole script:

echo -n "Input: " ; read x
if [ $(echo "$x" | grep -c "^[0-9]\{4\}$") -eq 1 ] ; then
     echo "Valid"
else
     echo "Invalid"
fi

I hope this helps.

bakunin

0-9 ↩︎

Aveltium · April 12, 2009, 8:12am

Wow holycow! Thanks a bunch!! I never got such nice answer like that! Very informative and helpful! Thanks again!!