awk easy question

So, I have the following code:

cat testfile.txt | awk -F, '{ print $1" "$2" "$3" "$4" "$5 }'  | read DOC ORG NAME
echo "$DOC"
echo "$ORG"
echo "$NAME"

My testfile.txt looks something like the following:

Document Type,Project Number,Org ID,Invoice Number

It will eventually be more complicated, but that's not the point.

WHY OH WHY does awk insist on using " " as a field separator, in addition to the ","? What am I doing wrong here? My output is

Document
Type

I have tried using BEGIN and setting FS=',' . I've tried putting quotes everywhere I can think of, of every type. No dice.

If your requirement is to get the comma separated values into different variables, then why use awk?

#!/bin/ksh

while IFS="," read DOC PRID ORG NAME
do
        echo "$DOC"
        echo "$PRID"
        echo "$ORG"
        echo "$NAME"
done < testfile.txt
2 Likes

It's not awk, it's the read command splitting the input at spaces (awk removed the commas before). Why don't you read that file immediately like

IFS="," read DOC ORG NAME REST

?

Rats, One : Nil for Yoda!

1 Like

awk is working fine.

You may have to change the IFS while reading :slight_smile:

1 Like

if you don't want space as default field separator, you can define one using OFS
BEGIN{FS=OFS=","} or awk -F, '{.......}' OFS=\, input

1 Like

Thanks everyone! I'm pretty bad at this, as you can tell, but this forum has been amazing so far :slight_smile:

---------- Post updated at 01:24 PM ---------- Previous update was at 11:36 AM ----------

Ok, one more stupid question.

i=0
while [ i -le 1 ]
do
echo "<section name=\"Query$i\">"
while IFS="," read DOC VOUCHER PO PROJECT ORG CONTRACT CUSTOMERIN CUSTOMERID PROPOSAL OWNING EMPLOYEE VENDOR DATE PRMAN PR SUBJECT RQID COST CO RQ
do
        echo "$DOC"
        echo "$VOUCHER"
        echo "$PO"
        echo "$PROJECT"
        echo "$ORG"
        echo "$CONTRACT"
        echo "$CUSTOMERIN"
        echo "$CUSTOMERID"
        echo "$PROPOSAL"
        echo "$OWNING"
        echo "$EMPLOYEE"
        echo "$VENDOR"
        echo "$DATE"
        echo "$PRMAN"
        echo "$PR"
        echo "$SUBJECT"
	echo "$RQID"
	echo "$COST"
	echo "$CO"
	echo "$RQ"
done 
i=`expr $i + 1`
done < test2.csv

My plan here is to display some stuff before the parsing begins, then loop through the first line, echo all the contents (with more formatting yet to come), then once I hit the new line, go back to the beginning of the first loop and start again. As of right now, however, my output looks like this:

<section name="Query0">
IA Supporting Docs
Voucher Number
Invoice Number
PO Number
Project Number
ORG ID














</section>
<section name="Query1">
</section>

I have a lot of blank fields in the file, so the big space is expected. My issue is the fact that it doesn't currently seem to loop back around for the second line. Is this where awk would need to come in? Every line has the same number of "elements" separate by commas, which is the amount of variables I am catching, but is there something special I need to do for it to jump to the next line? The final file will have >100 lines in it, but I am testing with a file with just 2 lines before I go that far.

This is wrong:

while [ i -le 1 ]

Should be:

while [ "$i" -le 1 ]

Why do you have a while loop inside your while loop, though? The 'while read' loop will eat all your lines. By the time the second loop happens, they will be all gone.

If you want to read the same file repeatedly, redirect your file into the inner loop, not the outer one.

I changed it, but still getting the same results. It seems to be the inside loop that is not running again. Probably something I don't understand about the way the while loop is functioning... that's what I get for trying to use Yoda's code without fully comprehending!

I see what you're saying with your edit. I will try to pare it down to one loop and see if that works. I was just unsure how to differentiate between newlines, but since each line will have a static number of fields, that probably shouldn't be a problem anyway.

---------- Post updated at 02:44 PM ---------- Previous update was at 01:34 PM ----------

I dropped it to only one loop, removing the outer while loop, and it still seems to only read one line. BTW, if I'm supposed to start a new topic for this, sorry.

#!/bin/ksh
i=1
while IFS="," read DOC VOUCHER PO PROJECT ORG CONTRACT CUSTOMERIN CUSTOMERID PROPOSAL OWNING EMPLOYEE VENDOR DATE PRMAN PR SUBJECT RQID COST CO RQ
do
		echo "<section name=\"Query$i\">"
		echo "$DOC"
		echo "$VOUCHER"
        echo "$PO"
        echo "$PROJECT"
        echo "$ORG"
        echo "$CONTRACT"
        echo "$CUSTOMERIN"
        echo "$CUSTOMERID"
        echo "$PROPOSAL"
        echo "$OWNING"
        echo "$EMPLOYEE"
        echo "$VENDOR"
        echo "$DATE"
        echo "$PRMAN"
        echo "$PR"
        echo "$SUBJECT"
		echo "$RQID"
		echo "$COST"
		echo "$CO"
		echo "$RQ"
		i=`expr $i + 1`
		echo "</section>"
done < test2.csv

Shouldn't it go back and keep going until there's nothing left? Or do I have to specifically tell it to go to the next line somehow?

Can you post few lines from your input file: test2.csv in code tags

IA Supporting Docs,Voucher Number,Invoice Number,PO Number,Project Number,ORG ID,,,,,,,,,,,,,,,,
Invoice,Voucher Number,Invoice Number,PO Number,Project Number,ORG ID,,,,,,,,,,,,,,,,

I don't have FTP access to the server yet, so I just took the first two lines from what I'm working on and copy/pasted, creating test2.csv.

This is the output I'm getting, which I just noticed is giving me Query2 as the section name... interesting.

<section name="Query2">
IA Supporting Docs
Voucher Number
Invoice Number
PO Number
Project Number
ORG ID














</section>

I would suggest using awk because it will work for any number of fields in your input file:

awk -F, '
        {
                print "<section name=\"Query" NR "\">"
                for ( i = 1; i <= NF; i++ )
                {
                        if ( $i )
                                print $i
                }
                print "</section>"
        }
' test2.csv
1 Like

Since it prints so many lines, how can you tell anything's being skipped? Many would scroll right off the screen.

Yoda, that is a beautiful piece of code compared to mine, wow. Plus, I actually understand it! Thank you so much.

And Corona, not sure I understand the question? I could see the "blank" lines matched up with the number of empty cells in spreadsheet/empty fields in CSV file.

I should be able to work from that and get everything I need... hopefully. Thanks again, you can bet I'll be back.

You are printing many lines, and not redirecting to a file. If you are printing more lines than your terminal has lines, the remainder will vanish off the top.

Ah, yes. I am using xshell, so I can scroll up and down. Good point though.