sed & cut command issues

robrom78 · September 29, 2012, 4:16am

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!

The problem statement, all variables and given/known data:
After using the egrep command to pull certain lines from the asg5f1 (creating the asg5f1c file), I am required to use the sed command in order to change all tabs in the asg5f1c file to a colon, and dump the results into a temporary file. As I understand it, in the asg5f1c file <tab> is a delimiter, and I believe I am being asked to change the delimiter from a tab to a colon.

Below is an example of what I believe the asg5f1c file should look like after the egrep command, and what the sed command will be working on (roughed-in on MS Word�).

    Employee            Hours    Rate       Hours    Rate       Gross    Net
  lname, fname    99           99.99     99           99.99     999.99   999.99
  lname, fname    99           99.99     99           99.99     999.99   999.99

After this, i am to sort the asg5f1c file and join it with another file in order to create the asg51 file.

At this point, I'm :wall:. This problem is causing a domino effect for the latter parts of the assignment, and thus holding up the whole line.

With my previous use of the sed command, I thought that the following:

sed �s/<tab key pressed>/:/g� asg5f1c > tmp1

would solve my problem. The way I read the above line is �In the file asg5f1c, for all instances of a tab, substitute in a colon, and place the result into tmp1.� The above sed command outputs the following:

lname, fname:40.00:6.50: 2.00: 9.75:279.50:176.09
lname, fname:40.00:6.50:: 9.75:260.00:163.00
lname, fname:25.00:6.50:: 9.75:162.50:102.38
lname, fname   :40.00:6.50: 3.00: 9.75:289.25:182.23
lname, fname :40.00:6.50:: 9.75:260.00:163.00
lname, fname   :33.00:6.50:: 9.75:214.50:135.14
lname, fname:40.00:5.50:: 8.25:220.00:138.60
lname, fname  :40.00:5.50:: 8.25:220.00:138.60
lname, fname  :40.00:4.25:10.00: 6.38:233.75:147.26
lname, fname  :40.00:7.50::11.25:300.00:189.00

This is holding me up because my asg51 file looks the same as the above output, with the addition of a colon at the end of each line. From here, I am to cut the gross pay and net pay fields from the asg51 file. The problem being is that apparently the above "faulty" output is one field, and when I cut the pay fields:

cut -f6 asg51 > gp && cut -f7 asg51 > np

the gp & np files look like my "faulty" output, and that's when everything comes to a screeching halt.

Relevant commands, code, scripts, algorithms:
Unsure what to put here. Commands I am required to use are egrep, sed & cut.
The attempts at a solution (include all code and scripts):
I've tried sed 's/\<tab key pressed>/:/g' and sed 's/\\<tab key pressed>/:/g' , but nothing works. I read & re-read the text & supplemental readings, but I can't figure it out.
Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
University of Maryland University College. Adelphi, MD. USA. Prof. T. Tomko. CMIS 325.

*Unable to post link to class because "You are only allowed to post URLs once you have at least 5 posts," and this is my first post.

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).

bakunin · September 29, 2012, 10:24am

You are probably right.

Below is an example of what I believe the asg5f1c file should look like after the egrep command, and what the sed command will be working on (roughed-in on MS Word...).
  Employee        Hours        Rate      Hours        Rate      Gross    Net
  lname, fname    99           99.99     99           99.99     999.99   999.99
  lname, fname    99           99.99     99           99.99     999.99   999.99
[...]
sed �s/<tab key pressed>/:/g' asg5f1c > tmp1
would solve my problem. The way I read the above line is �In the file asg5f1c, for all instances of a tab, substitute in a colon, and place the result into tmp1.�

That is correct - this is what the sed command does.

lname, fname:40.00:6.50: 2.00: 9.75:279.50:176.09
lname, fname:40.00:6.50:: 9.75:260.00:163.00
lname, fname:25.00:6.50:: 9.75:162.50:102.38
lname, fname   :40.00:6.50: 3.00: 9.75:289.25:182.23
lname, fname :40.00:6.50:: 9.75:260.00:163.00
lname, fname   :33.00:6.50:: 9.75:214.50:135.14
lname, fname:40.00:5.50:: 8.25:220.00:138.60
lname, fname  :40.00:5.50:: 8.25:220.00:138.60
lname, fname  :40.00:4.25:10.00: 6.38:233.75:147.26
lname, fname  :40.00:7.50::11.25:300.00:189.00

Have a cose look at this file: in some lines there are two colons adjacent. Probably your input file, in order to look neatly formatted, uses several instead of exactly one tab character to delimit its columns - probably the number of used tabs varies from line to line to compensate for names of varying length. As your replacement is on a one-to-one basis several tabs will become as many colons.

Here is my suggestion: first off, if you work for Unix, work on it. By working on another OS (to my knowledge MS-Word only works on MS-Windows) you introduce an additional complexity instead of making it easier. The graphical interrupt-handler from Redmont uses another line-delimiting schema than Unix (<CR><LF> instead of <newline>) and scripts written with a Windows-tool will not necessarily run under Unix because of this.

Furthermore, do not use text processors (not even working ones, let alone Word) to write program text. They are not built for this. They are built for formatting text and entering "flow" text (organised in paragraphs, chapters, etc.). Program text is neither formatted nor organised that way. On the other hand text processors usually fall short where the needs of programmers are concerned: automatic indenting, template editing, finding a certain line, complex search-and-replace-jobs, etc.. It is like using a Ferrari to pull a 40t-articulated lorry. True, the Ferrari has the same 500hp as the bobtail, but it is simply not built for the task to pull large amounts of cargo.

So, do yourself a favour and use "vi" or "emacs" if you prefer that or any other of the myriad of text editors available on Unix systems. I personally prefer vi for its raw power and unparalleled speed, but i admit it has a steep learning curve. All i can say is: it is a rewarding task to learn it. Usually cursing it the first 3 months you use it is the beginning of a life-long love-affair.

After these rather general and philosophical remarks something more concrete:

In "vi" you can view unprintable characters by pressing ":" (get in command mode) and then entering "set list". You will notice that every line end is now marked with "$" (the newline character) and tabs become "^I" (one character as you can see going over it with the cursor). You can switch this mode off again by entering ":set nolist". This way you can analyse your input file and find out if all your assertions about its structure are indeed correct.

If you find out that several tabs instead of exactly one are sometimes used you have to redefine what a "field delimiter" is in your file: not "one tab character", but "one or several tab characters" and you will have to modify your delimiter-translating program accordingly.

I hope this helps.

bakunin