Selecting awk output depending on grep result

takada · June 18, 2012, 2:57pm

Hi,

I don't script often enough to know how to do this, and I can't seem to find a right example online. I have a csv output from an old, old system (Win2K???), from which I want to extract only certain fields. Initially I came up with something like this:

cat file1 | awk -F '"' '{print $8 " " $12 " " $16}'

The csv file contains some data like this:

<file1>

"pippo","is fat","last","Cruise","first","Tom","UID","1234","more blah"
"monky","looks funky","last","Jones","first","Catherine Zeta","UID","2345","more blah"
"lion","rules savannah","last","Baldwin","first","Alec","UID","3456","more blah"

So the output would be:

Cruise Tom 1234
Jones Catherine Zeta 2345
Baldwin Alec 3456

But I realized later that the csv file may contain lines that do not conform to the above format when a user has more than one UID. The additional UID's are appended following the first appearance to that user name like this:

"pippo","is fat","last","Cruise","first","Tom","UID","1234","is 50 years old"
"monky","looks funky","last","Jones","first","Catherine Zeta","UID","2345","is still hot"
"lion","rules savannah","last","Baldwin","first","Alec","UID","3456","his brother sued costner and lost"
"taco","4567","blah","age of rock"
"chili","5678","blah","flopped bit time"
"mojito","tastes awesome","last","Brand","first","Russell","UID","6789","didn't deserve katy"

I am trying to script it so that I will get an output like this:

Cruise Tom 1234
Jones Catherine Zeta 2345
Baldwin Alec 3456
Baldwin Alec 4567
Baldwin Alec 5678
Brand Russell 6789

I would think I can do this with if statement and while loop? Users with multiple UID's can appear several times randomly, but one user with multiple UID's appear as a sequential block and one time only in the initial csv file.

If someone can point me to the right direction, I would greatly appreciate it.

Regards,

Bash Noob...

Corona688 · June 18, 2012, 3:10pm

That is a useless use of cat.

So you want to ignore lines with too few fields?

awk -F '"' 'NF>8 {print $8 " " $12 " " $16}' file1

Ah, you want to carry it forward?

awk -F '"' 'NF>8 {print $8, $12, $16; A=$8; B=$12; next}; { print A, B, $4 }' file1

Scrutinizer · June 18, 2012, 3:10pm

If there are 19 fields (NF==19) then record $8 and $12 in variables, say v1 and v2 and print $8,$12,$16 . If NF==9 then print v1, v2 and $4.

takada · June 18, 2012, 3:36pm

Sorry for not being clear. It is not the number of fields that matters in each line. I added the line numbers below so that they can be easily referred. They are not part of the file content.

There are basically two formats. One format contains the user's last/first names and UID, e.g. lines 1, 2, 3 and 6. The second format contains only the UID (that is useful), e.g. lines 4 and 5. The second format appears when the user in the above line has more than one UID's in the system. So in the result of the script, I want that user name repeated with additional UID's rather than being omitted (as in the original file). Does this make sense?

I also did not make it clear that in the lines in the second format, the UID's always appear in the same field, $4, e.g. lines 4 and 5. So I want to repeat $8 and $12 from the directly above line (line 3), and print $4 for lines 4 and 5.

If saying $4 is a number is not ideal, $5 in the second format always contains the exact same string, blah in this case in lines 4 and 5.

1 "pippo","is fat","last","Cruise","first","Tom","UID","1234","is 50 years old"
2 "monky","looks funky","last","Jones","first","Catherine Zeta","UID","2345","is still hot"
3 "lion","rules savannah","last","Baldwin","first","Alec","UID","3456","his brother sued costner and lost"
4 "taco","4567","blah","age of rock"
5 "chili","5678","blah","flopped bit time"
6 "mojito","tastes awesome","last","Brand","first","Russell","UID","6789","didn't deserve katy"

Scrutinizer · June 18, 2012, 3:44pm

Why do the number of fields not matter? line 1,2,3 and 6 have 19 fields and 4 an 5 have 9 fields, why can that not be used a s criterium?

takada · June 18, 2012, 3:55pm

OK, I see what you are saying. The only reason is that I don't know if there will be yet another format that has a different number of fields. But I can cross the bridge when I come to it. The only issue with the suggested script is that it grabs the fields that I don't want for line 5. It grabs $8 and $12 from line 4. I want to repeat $8 and $12 from line 3.

Some user may have a dozen of UID's. So lines 4 and 5 can be a dozen of lines and all of them belong to the user in line 3. Then it proceeds with users with only one UID for a while, hit another user with multiple UID's, and so on.

Thanks for great advices in such short time,

Scrutinizer · June 18, 2012, 4:04pm

If I put #3 into script I get this:

awk -F\" 'NF==19{a=$8; b=$12; print a,b,$16} NF==9{print a,b,$4}' infile

, which produces:

Cruise Tom 1234
Jones Catherine Zeta 2345
Baldwin Alec 3456
Baldwin Alec 4567
Baldwin Alec 5678
Brand Russell 6789

takada · June 19, 2012, 8:08am

Awesome! I have much to learn. Thank you!