problem developing for or while loops with the cut command

synergy_texas · November 11, 2008, 10:12pm

I have been working on a script all day and am having very little success. Essentially, I am reading a file and trying to format the data in the file. I am currently trying to put the data into three separate files, so that I can paste them back together into a single output file.

Trying to use the for and while loops are both giving me problems. The while loop only creates 1 file and leaves the other two blank. The for loop keeps giving me an error and yet it still presents my data to stdout, but not the output files. Here are the two methods I am trying to use.

exec < workunits-status.txt
while read line; do
cut -c 1-12 | sed "s/ //g" >> tempfile1.txt
cut -c 13-35 | sed "s/ //g" >> tempfile2.txt
cut -c 36 >> tempfile3.txt
done
paste tempfile1.txt tempfile2.txt tempfile3.txt >> workunits.txt

for i in `cat workunits-status.txt`
do
cut -c 1-12 $i | sed "s/ //g" >> tempfile1.txt
cut -c 13-35i $i | sed "s/ //g" >> tempfile2.txt
cut -c 36 $i >> tempfile3.txt
done

The while loop gives me tempfile1.txt with data, but not in the order of the workunits-status.txt file and does not create the other two files.

The for loop gives me the error below, yet it is showing the data that I am going after.

cut: cannot open : 1458
cut: bad list for c option
cut: cannot open : 1458
cut: cannot open : CustomActivity1
cut: bad list for c option
cut: cannot open : CustomActivity1
cut: cannot open : 3
cut: bad list for c option
cut: cannot open : 3
cut: cannot open : 1459
cut: bad list for c option
cut: cannot open : 1459
cut: cannot open : CustomActivity1
cut: bad list for c option
cut: cannot open : CustomActivity1
cut: cannot open : 3
cut: bad list for c option
cut: cannot open : 3

Can someone please help and explain why both of these are not working? Your help would be greatly appreciated. Please assume that all files are in the current working directory of the script. Also, only shell scripting. I am not experienced in Perl, expect, tcl or other languages. The use of sed and awk are ok.

My input file has three fields that I cannot format using the application utility that creates it. I have to format it afterwards. I need to break the three fields apart so that I can output it to a report that has three columns. The data fields should align under each column.

I initially tried to use the following syntax, but could not get it to work either:

exec < workunits-status.txt
while read line; do
if [[ $F1 -lt 100 ]];
then
awk -v F1="$FIELD1" F2="$FIELD2" F3="$FIELD3" '{print $F1 " " $F2 " " $F3}' >> workunits.txt
elif [[ $F1 -gt 99 && $F1 -lt 1000 ]];
then
awk F1="$FIELD1" F2="$FIELD2" F3="$FIELD3" '{print $F1 " " $F2 " " $F3}' >> workunits.txt
else
awk F1="$FIELD1" F2="$FIELD2" F3="$FIELD3" '{print $F1 " " $F2 " " $F3}' >> workunits.txt
fi
done

Where FIELD1, FIELD2, and FIELD3 are variables declared at the beginning of the script using the cut command as seen earlier in the two loop examples. And FIELD1 will be a numeric value ranging from 1 - 9999.

Lakris · November 11, 2008, 11:36pm

Hi,
There are some strange things going on in Your code; why do You want to exec the txt-file? In the first while loop the cut statement has no argument, in You first for loop the i is set to every word (not line) in the txt-file. And so on...

Could You give us a few lines from the text file that You want to process and an example of desired output?

/Lakris

Annihilannic · November 12, 2008, 1:22am

Try this:

while read line; do
    echo "$line" | cut -c 1-12 | sed "s/ //g" >> tempfile1.txt
    echo "$line" | cut -c 13-35 | sed "s/ //g" >> tempfile2.txt
    echo "$line" | cut -c 36 >> tempfile3.txt
done <  workunits-status.txt
paste tempfile1.txt tempfile2.txt tempfile3.txt >> workunits.txt

The problem was that the first cut was reading the same standard input source as the while loop and consuming all of the input, leaving nothing for the remaining commands to read. Since you need to use the input line for all 3 cut commands you just need to use the contents stored in the line variable and send it to each individual cut.

Lakris, the exec < workunits-status.txt doesn't execute the text file, it assigns it as the default source of standard input.

A simpler awk solution would be something like:

awk '{print substr($0,1,12), substr($0,13,22), substr($0,36)}' workunits-status.txt > workunits.txt

synergy_texas · November 12, 2008, 2:17pm

Thanks for the feedback. Still not doing quite what I need. Here is a sample of the input. The first column has 12 characters with leading spaces. The last column is character #36.

           4 pftestAssert02b       3
          14 pftestCustom01b       3
          28 pftestAssert02b       3
          38 pftestCustom01b       3
         107 pftestAssert02b       3
         117 pftestCustom01b       3
         129 pftestAssert02b       3
         139 pftestCustom01b       3
        1043 VMOHMQTASK            3
        1044 VMOHMQTASK            3
        1045 VMOHMQTASK            3
        1073 email_test            3

Here is the expected output:

WorkUnits:             Process-Name:              Status:
4                      pftestAssert02b            3
14                     pftestCustom01b            3
28                     pftestAssert02b            3
38                     pftestCustom01b            3
107                    pftestAssert02b            3
117                    pftestCustom01b            3
129                    pftestAssert02b            3
139                    pftestCustom01b            3
1043                   VMOHMQTASK                 3
1044                   VMOHMQTASK                 3
1045                   VMOHMQTASK                 3
1073                   email_test                 3

The suggested awk script is getting me closer, however that is why I was using the while loop, so that when the workunit was less than 100, it would put in the two extra spaces needed in the output. When the number was between 100 -199, it would put in the extra space. The awk needs to remove leading spaces as well, which is what my sed after the cut was trying to accomplish.

Once again, any help would be greatly appreciated.

synergy_texas · November 12, 2008, 2:19pm

Sorry. The expected output has the columns lined up. The forum seems to have taken out the spaces.

This has been edited now with the [code] tags in this posting. The output should look correct.

synergy_texas · November 12, 2008, 5:39pm

Working now, for the most part. I have the columns aligning up as I would like. The only other thing I would like to do is trim the leading spaces out of the first field. I am using the awk script as suggested before. The only thing I added was to put spaces between the three fields. This essentially lines them up into separate columns as I need.

Now I just want to trim off the leading spaces of the first field so that the numbers show up at the beginning of the column and not at the end.

danmero · November 12, 2008, 5:42pm

Edit your previous post and use [code]..[ /code] tags when you post data or code.

synergy_texas · November 12, 2008, 9:55pm

OK. So here is what I have. It almost looks good, but can't figure out why it is giving me the output it is providing.

exec < workunits-status.txt
while read line; do
    awk '{ if ($1 < 10) print substr($0,12), "       ", substr($0,14,35), "
   ", substr($0,36);
          else if (($1 > 9) && ($1 < 100)) print substr($0,11,12), "      ", sub
str($0,14,35), "        ", substr($0,36);
          else if (($1 > 99) && ($1 < 1000)) print substr($0,10,12), "     ", su
bstr($0,14,35), "        ", substr($0,36);
          else print substr($0,9,12), "    ", substr($0,14,35), "        ", subs
tr($0,36);
         }' workunits-status.txt > workunits.txt
done

What it is not doing is giving me the spaces as specified in the print command and it is throwing out duplicate information on the same line. See example below.

WorkUnit:    Process-Name:        WF-Status:
4 pftestAssert02b       3         pftestAssert02b       3          3
14 pftestCus        pftestCustom01b       3          3
28 pftestAss        pftestAssert02b       3          3
38 pftestCus        pftestCustom01b       3          3
50 pftestAss        pftestAssert02b       3          3
60 pftestCus        pftestCustom01b       3          3
72 pftestMer        pftestMerc01          3          3

Please help. This has been day 2 and a trying experience. I have learned a lot from it already. Your help would be greatly appreciated.

danmero · November 12, 2008, 10:30pm

synergy_texas:

Here is the expected output:

WorkUnits:             Process-Name:              Status:
4                      pftestAssert02b            3
14                     pftestCustom01b            3
28                     pftestAssert02b            3
38                     pftestCustom01b            3
107                    pftestAssert02b            3
117                    pftestCustom01b            3
129                    pftestAssert02b            3
139                    pftestCustom01b            3
1043                   VMOHMQTASK                 3
1044                   VMOHMQTASK                 3
1045                   VMOHMQTASK                 3
1073                   email_test                 3

...

awk 'BEGIN{printf "%-22s %-26s %s\n","WorkUnits:","Process-Name:","Status:"}{printf "%-22s %-26s %s\n",$1,$2,3}' file

Lakris · November 13, 2008, 3:00pm

Hello guys!
I learned a lot from this thread, most of all that exec not always did what I was used to. Actually I've used that method in cmd (<file to set stdin for consecutive commands) and it is always good to see parallels.

Anyway, after Danmeros solution I just had to figure out a way to do it in more familiar grounds, awk and sed not being part of it and how OP started out with a while loop. I fiddled with different loops and smaller programs like tr, cut, column, etc and even thought about printf. But Danmero showed the way.

I wanted to do it in "pure" shell and given that printf usually is a built in, it should be faster that calling an external program... I guess it's a question of nanoseconds!

Well, for what it's worth, it's yet another way to do it. printf was the key.

/Lakris

PS And I hope it's not a totally worthless use use of cat...

danmero · November 13, 2008, 3:43pm

cat is external program and is useless

printf "%-22s %-26s %s\n" WorkUnits: Process-Name: Status:
while read x y z;do printf "%-22s %-26s %s\n" $x $y $z;done < file

Lakris · November 13, 2008, 3:56pm

Hehe, right, I did EXACTLY that, hence my statement about external programs, but before I posted it I decided to try another one without TWO printf statements... I guess the cat bit the hand...

Yet another way!

/Lakris

synergy_texas · November 13, 2008, 4:10pm

I agree with Lakris, I have learned a lot from this post. It amazes me how you guys come up with 1 liners that sometimes take me up to 15 lines. You guys are awesome. I appreciate your help. I have already turned one of my coworkers on to this forum. You guys have been a great help and I will continue to research and use this forum for future issues. Hopefully, I will get better at scripting.

Lakris · November 13, 2008, 4:34pm

Been there, done that, and lo and behold;
I went to The UNIX and Linux Forums - the Top UNIX & Linux Q&A on the Web

Now I learn a lot from the questions, and other peoples answers, giving me new views on problems that I didn't know I could face but sure as rain will (and has) hit me one day. It is a continuous learning experience.

Good luck

/Lakris

danmero · November 13, 2008, 5:56pm

Stop asking questions and start finding solution ... for you and others.

paresh_n_doshi · November 14, 2008, 4:18am

make things simper guys.

if i understand properly, u want the leading spaces of first column removed

try this
awk '{printf("%-12s %12s %22s\n",$1,$2,$3) ' filename > outfile
after u see the results u may modify the line accordingly
i have given a space between columns;

Lakris · November 14, 2008, 1:50pm

Hi again,
Well, I'm sorry to say it, but that doesn't give the expected output, at least not on my machine. The columns don't line up. And he also wanted headers, which was the reason for either two printf statements in an awk program or other AWKward cats and whiles and echoes... a seemingly complicaticated way to do it.
If it wasn't for the headers, Danmeros original solution would still have been the most correct:

awk '{printf "%-22s %-26s %s\n",$1,$2,3}' infile

or even

while read x y z;do printf "%-22s %-26s %s\n" $x $y $z;done < infile

I did an experiment: the latter is faster on my machine on small files (the actual content supplied by OP) but when the file is 100 times bigger the awk program is a lot faster, a 10th of the time. So the conclusion is that if You need a reason to go for a cup of cocoa, run the while loop. If You are anxious to get home at the end of the day, do the awk. It's several nanoseconds we're dealing with here dudes!

The point, in this case, is taking in the arguments with any program that ignores repeated whitespace and reformat it in the desired fashion. At least that's how I see it. And which I missed completely in my first attempts at cracking it, focusing on how to handle or convert whitespace.

/Lakris