String editing using sed? awk?

peage1475 · November 11, 2009, 10:13am

The problem statement, all variables and given/known data:

Problem Statement for project:

When an account is created on the CS Unix network, a
public html directory is created in the account's home directory. A default web page is put into that directory.

Some users replace or edit the default page, while others do not. We would like to add a new link from the department web page to another page which lists all students who have changed their web page from the default. We don't want to track these things by hand, and we want the list to be automatically updated every night at 3:00AM.

You are to write a script that will

find all users who are students and who have changed their web page from the default that
was provided when their account was created,
generate an HTML file named student web pages.html which contains a nicely formatted list of links to each student web page that you found,
copy that file into a directory specified on the command line, and
make sure that anyone can read it.

My Problem:

I have edited the output of the above find command to a file that has lines that look like this:

What I need it to do now is take that line and turn it into this:

<a href="http://www.cs.ttu.edu/~name">name</a>
<a href="http://www.cs.ttu.edu/~name2">name2</a>
<a href="http://www.cs.ttu.edu/~nameasdas">nameasdas</a>

I have tried using sed and just can't figure it out. Can anyone help?

Relevant commands, code, scripts, algorithms:

Sed ?? awk??

The attempts at a solution (include all code and scripts):
Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):

Texas Tech University
Lubbock TX
USA
Dr. Pyeatt
CS 3352

Thanks for the help.

bakunin · November 11, 2009, 3:46pm

First off, your estimation about sed being the right tool is correct. It was written with problems like yours in mind.

BUT: i don't think your are on the right track. The reason is: you create a sort of list and this list is almost correct - almost, but not quite so. Now you want to patch this almost-correct list into something which is indeed correct. The problem is: when you are writing software it is most of the times advisable to do it correctly the first time instead of doing it almost correctly and then patch it.

Instead of creating an sed-script to "patch the list to work" create the list correctly immediately. This is easily possible.

You mention an "above find command", but i cannot find that. So: why don't you put your script so far here and we help you to create not an amost-correct but a correct list.

Btw., i have a suspicion that your find-command (if you have used that alone) will not be giving the correct result either because i don't know of any way to solve your problem (even to the almost-correct state you have) with "find" alone - this would be a rather non-trivial task if it could be done at all (and i suspect it is impossible). This also could be corrected if you provide your work so far.

I hope this helps.

bakunin

peage1475 · November 11, 2009, 6:56pm

Here is my code that I have written so far:

find ./*grad*/*/public_html/index.html -newer /etc/skel/local.cshrc > undergrad1/peage/files

cd

cat files | sed 's#\./grad[1-5]/#\<p\>\<a href\=\"http://www.cs\.ttu\.edu/\~#g' > ./links

cat links | sed 's#\./undergrad[1-5]/#\<p\>\<a href\=\"http://www.cs\.ttu\.edu/\~#g' > ./links

cat links | sed 's#/public_html/index\.html#\"\>Link\</a\>#g' > ./links

echo '<html>
<body>' > foo1

cat links > foo2

echo '</body>
</html>' > foo3

cat foo1 foo2 foo3 > ./public_html/test.html

The find command is just giving me the path of the user who's index.html is edited, and I am trying to transform that into the links I stated above. My problem is that the users name is only listed once in the path and when I am using sed I do not know how to copy just their user name(lengths varying) to be used twice in the same line.

Thanks agian for the help

daptal · November 11, 2009, 7:39pm

Can you post a snippet of lines in the file "files" which you cat in the first line to get the file links

Cheers

bakunin · November 11, 2009, 7:52pm

Ok, this is about the script i expected. It has several problems, but first, lets analyze your problem correctly. Software engineering (and script programming is a form of software engineering) is about correctly analyzing the problem before you envision a solution.

Your "find" command compares dates - the date/time the file local.cshrc has been changed to when the file index.html in the users home directory. If a new user is created after the last change of the skeleton file his file ~/index.html will be newer regardless of being changed or not. On the other hand if the skeleton file is changed even the files reported as changed before will be reported as not changed. So you don't find the correct files in first place.

Lets see: when a new user is created s/he is delivered some standard version of index.html. If s/he changes it it will differ from this standard file, if not then not. Now this is indeed a criterion which will pick out the right files, yes?

So, your task is to find a unix utility which compares two files and finds out if they differ or not. If there is such an utility we could write the following program (in pseudocode):

while (cycle through all users)
     do
     if (~thisusers/index.html is equal  to the standard index.html)
          do nothing
     else
          write a corresponding line to your output file
     endif
enddo

There are two things you can do for now:

1) find a way to get all the users names. (You will need this to set up the while-loop above.) I could tell you, but i don't want to spoil your fun finding it out.

2) find a unix command which compares two files and finds out if they are equal (you will need this for deciding the if-construction).

Report back here with your answers and i will give you the directions for the next stage.

Sorry for dragging this out this way, but i cannot give you enough directions at once without giving away the solution. As the spirit of this board is to help you learn and not spoon-feed you solutions we have to do it this way. As i am convinced you will soon find out Unix is not only a lot of fun to work with but also a lot of fun to learn about, so enjoy it while the learning experience lasts.

bakunin

peage1475 · November 11, 2009, 8:19pm

Thanks alot, and I would much rather learn myself than be spoon fed since I like this kind of stuff. Thanks again and I will see what I can do.

---------- Post updated at 07:19 PM ---------- Previous update was at 07:01 PM ----------

Ok im already having trouble. I can create a file that lists all the users; I know that I should use cmp to tell if the files are the same, but I do not know how to set up the while loop take each username in the file and then use it in the iff statement. I know I have to use a variable, but how do I get the while loop to transverse the file line by line?

daptal · November 11, 2009, 10:20pm

Assume users are in a file named users

for user in `cat users`; do echo $user; done

replace echo $user with the functionality you intend

HTH,
PL

bakunin · November 12, 2009, 5:26am

Excellent! This is the right attitude.

Ok, but where do you get the information? In your original find-command you have searched certain directories and this will give you the users only if you know for sure that all the users you search have their home directories there. Is there a data structure in a Unix system which lists all the users and their home directories?

Ok, here is a tip: if in troubles, consult the "man-pages". Everybody does it (Unix expertise is about to know where to look rather than knowing everything.) There is especially one command you might need in your future career:

man -k <keyword>       searches all the man-articles for <keyword>
                        and lists the articles where <keyword> was found

No problem, we will eventually get to that. You are correct, "cmp" is a correct tool for that purpose.

Why "a correct" and not "the correct"? Because in Unix there are usually several ways to do something - in this case i had "diff" in mind when i asked you, but "cmp" is also correct, so use this. (This - the possibility to do things in several ways - often leads to what Unixers call "religious wars". Some people will be convinced that there is only "one true way" and usually half of them will think variant A as this one true way and the other half will think variant B as the one true way. Exorcism ensues and this is why Unix is said to give people a purpose in life. )

Well, this is easy and i will tell you because it can't be researched easily:

Suppose you have some process, which output is a list, like. You can set up a loop with the elements of the list as variable content using the following construct. It is called a "pipeline":

process | while read variable ; do
   do_something_with $variable
done

Here is an example. Try out "ls -1" as single command first to see what "goes into" the pipeline, only then try the whole script. It should be pretty self-explanatory:

#!/bin/ksh

ls -1 | while read filename ; do
     print - "This is file: $filename"
done

Ok, report back when you have all the necessary information and we will continue with the last part of the script.

bakunin

peage1475 · November 12, 2009, 11:41am

ok here is what I am trying but it does not seem to work correcty

cat test1 | while read path
do
if cmp -s $path ./public_html/index.html
then
echo $path >> files
else
echo "working..."
fi
done

from my understanding cmp -s will produce either a 0, if the files are identical, or a 1 if the files are different. So my if statment says that if the files are different add $path to files, else prin working. why is this only producing working... when I know for a fact that there are at least 5 that are modified different then the one it is comparing too?

bakunin · November 12, 2009, 9:32pm

The reason is you have a simple syntactic error, but lets first go over what you have done correctly, because there are some good points.

First, you have found a fundamental structure in Unix: you can use the return code of a program to determine how it has operated. In this case you - correctly - try to use the return code of "cmp". Very good.

Second, you have found out another fundamental thing: like in the C programming language throughout Unix a value of 0 is considered to be a logical TRUE while every other value is considered to mean a logical FALSE.

Now to your error: the correct syntax of the if-statement is:

if [ :TRUE: ] ; then
     some_commands
else
     some_other_commands
fi

Consider ":TRUE:" to be some process (or set of processes in a pipeline) or condition which either results in a "0" (TRUE) or another integer (FALSE).

In the historical Unix there was a command called "[", which was a link to the command "/usr/bin/test", which is still there. I suggest you read to man page of "test" to find out which possibilities it has. Basically "test" gets some logical expression and like any other program returns "0" (if the logical expression results in TRUE) or "1" (FALSE).

Because "[" was a normal program (most shells have implemented it as an internal command in the meantime, but Unix is adamant about being backwards compatible) it is clear why before and after it there have to be spaces: the rest of the line up to "]" are commandline arguments to "test" and therefore there have to be spaces like "ls-l" is wrong and "ls -l" is right.

Try the following two scripts:

#!/bin/ksh
if 0 ; then
     echo TRUE
else
     echo FALSE
fi

#!/bin/ksh
if [ 1 = 1 ] ; then
     echo TRUE
else
     echo FALSE
fi

In the first script then change the "0" to "1" or another integer and try again - notice what happens in the light of what i did say.

In the second script change the (obviously true) expression to another expression and experiment. Watch the output.

Bonus question: the expression "1 = 1" is working, but in fact the expression is not suited for comparing integer values ("=" is for comparing strings). How should it be phrased as true integer comparison? Hint: consult the man page!

You should now be able to change your script to work. That leaves two questions:

1) You now feed some file content to the while-loop, but this is hardly an up-to-date user directory. It is a good development device, though, so i am very satisfied that you had this idea to "simulate having this list" for the moment. It is a clever way of solving one problem after the other - i do exactly the same when writing scripts. Still, you have to solve the problem of "how to obtain a list of all defined users in a machine" and - "how to get the subset of this list i am interested in"? Any ideas?

2) Notice, that you output only "$path" to an output file, and not, what you want to stand there. If you prepare the output file this way it will be intermediary in nature and this is not, what you want. Is there a way to manipulate strings in the shell and - if this is possible - which commands might be useful for this purpose? Hint: consult your classes written material, the recommended book or whatever you got in this class rather than the "man-page" first - this is something the man-pages are poorly suited to research. Once you have a good idea of string-manipulating commands read their man pages of course.

All in all I'm quite satisfied with your progress so far - you might not know it but you are close to the solution. Keep up the good work.

I hope this helps.

bakunin

peage1475 · November 12, 2009, 11:22pm

Yeah that does help, and thank you so much for helping me. However I think that another reason it is not working is that I am not using the cat | read correctly. I created a file that contains a path to some file then I used cat on it and piped it to read like so:

cat tmp1 | read testvar
echo $testvar

Nothing was printed by echo, I am thinking that my syntax is off or something and that is another reason why this is not working for me. I have tried finding out how to do this but I cannot find anything nor can I figure it out.

bakunin · November 13, 2009, 1:36am

Carefully analyze the difference between what you have done before and what you are doing now and you will understand whats wrong yourself:

cat test1 | while read line

versus

cat test1 | read line

"cat test1" is a process, yes? It has some output, which you can observe by issuing this command, it will print the content of the file, line by line.

Now you pipe the output of this process into another process, which is set up by "while ... do .... done". To the outside this counts as one process, even if it contains several other processes inside.

In the second case you pipe the output into a completely different process, a command called "read", which tries to fill a variable with content. In the first case "read" is only invoked by the while-loop as a sub-process and is presented one line after the other which it dutifully assigns to the variable $line. In the second case it is fed the whole output of the first process at once. What is poor "read" supposed to do, hm?

bakunin

peage1475 · November 13, 2009, 1:57pm

Ok so I finally got it all working except for the part on how to read from the command line. I do not know what I am supposed to do, or even begin to start on how to do it. Can you help please? thanks

bakunin · November 15, 2009, 9:05pm

If you don't know what to do i suggest you read the thread and answer the questions i posed. If you got the script working post it here and i can tell you what is correct and what isn't.

Up to now you have never mentioned any command line arguments you have to read so i'm a bit astonished to hear from them now. The problem as you have presented it doesn't need any input from the command line at all.

bakunin