Multiple if statements in AWK

Panri93 · May 19, 2021, 8:52am

Hi guys, I'm new at awk.

I'm trying to perform some modifications in a .csv file. The file looks something like:

"DAY_OF_MONTH", "DAY_OF_WEEK", "OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE","ORIGIN"
1,3,"EV",20366,"ORD"
2,2,"AH",15164,"BAR"

I'm creating a script that performs the following modifications:

DAY_OF_WEEK. In the file it is stated, 1,2,3, etc. I want to replace this number by the particular day. For example, 1 = Sunday, 2 = Monday, etc.
ORIGIN: In the file the ORIGIN it is written between "". I need to remove them. For example, "ORD" would be ORD.

I've done the following:

#!/bin/awk -f
BEGIN{FS = ","}
{
if ($2 == 1)sub($2,"Sunday"); print
if ($2 == 2)sub($2,"Monday"); print
}
END{}

However, my problem is that I don't know how to link many multiple conditions that affect a single print. In this sense, the output must be a new file containing the modifications carried out. Instead, what I'm getting is the file twice: 1st time with the replacement of 1 by Sunday and second one with the replacement of 2 by Monday.

Thanks in advance!

Neo · May 19, 2021, 10:59am

Hi @Panri93

For many of us, it's generally accepted that we have have a CSV file (like yours) that we convert the CSV files into an array of hashes.

Then, we do simple operations on the array when we want to manipulate or transform the data.

For example, in Ruby, we can use the builtin CSV parser:

Python and PHP also have similar "built-in" CSV parsers, which cover the original CSV file to an array which can easily be processed. For example, in Python we import the CSV libs, as follows:

import csv

As you might imagine, reading, parsing and writing CSV data is very common in 2021; having been around for a long time, and so most codes have basic tools and libs for this standard type of CSV processing.

Personally speaking, I use Ruby for this type of CSV task, and I use a Ruby gem called smarter_csv, but that's because I like processing CSV files with Ruby:

https://www.rubydoc.info/gems/smarter_csv/1.2.6

MadeInGermany · May 19, 2021, 11:26am

In awk the if block ends at the first semicolon or newline, unless it is enclosed by curly brackets.
Simply have one final print

  if ($2 == 1)sub($2,"Sunday")
  if ($2 == 2)sub($2,"Monday")
  print

You can make it little faster by putting an else before a subsequent if
An elegant method is a mapping array

BEGIN {
  split("Sunday Monday Tuesday ...", day)
  FS=OFS=","
}
{
  $2=day[$2]
  print
}

EDIT: OFS must be set because an assignment to a field causes a reformatting (re-split on OFS).
EDIT2: split() splits ob FS by default. I use a space, so FS should be set to comma after that!

RudiC · May 19, 2021, 8:43pm

Has your second question been answered / resolved? If still open, try

awk -F, 'NR>1 {gsub (/"/, _, $5)} 1' OFS="," file

Panri93 · May 20, 2021, 9:35am

Sorry but I'm having an issue when compilig all the modifications together. Right now the code is like:

#!/bin/awk -f
BEGIN{FS = ","}
{
#Modify DAY_OF_WEEK
if ($2 == 1)sub($2,"Sunday"); 
if ($2 == 2)sub($2,"Monday");
if ($2 == 3)sub($2,"Tuesday");
if ($2 == 4)sub($2,"Wednesday");
if ($2 == 5)sub($2,"Thursday");
if ($2 == 6)sub($2,"Friday");
if ($2 == 7)sub($2,"Saturday");
#Modify ORIGIN
if (NR!=1) substr($5,2,3);
print
}

The output of this is that the modification on DAY_OF_WEEK is performed, however, the ORIGIN column is not. Any idea why?

RudiC · May 20, 2021, 2:43pm

Yes. While sub (r,s,t) operates on (and modifies) the target immediately, substr (s,i,n) reads the operand and supplies the result as its return value. So try

$5 = substr($5,2,3)

in your script.
Be aware that your statement

is not quite true. As sub - if t is missing - operates on $0, the entire record, in your second data line the "DAY_OF_MONTH" field will be substituted by "Monday", because it is the first matching field.

Panri93 · May 20, 2021, 5:12pm

Thanks a lot!!!

Panri93 · May 21, 2021, 4:44pm

Oh, I just realized about you last sentence about sub. How could I solve this?

RudiC · May 21, 2021, 4:57pm

Supply $2 for the t parameter to sub.

MadeInGermany · May 21, 2021, 10:08pm

Why sub() ?

BEGIN { FS=OFS="," }
{
#Modify DAY_OF_WEEK
if ($2 == 1) $2="Sunday"
else if ($2 == 2) $2="Monday"
...

Panri93 · May 22, 2021, 10:21am

this also works! thanks!

vgersh99 · May 22, 2021, 2:41pm

why all these nested/convoluted/hard-wiring just to translate "day of week (1..7)" to "locale's full weekday name (e.g., Sunday)"?
I thought @MadeInGermany in this post has made a very reasonable suggestion to use an array of "weekday names"....

vgersh99 · May 25, 2021, 2:34pm

@Panri93,
Any particular reason you cross-posted the same question on SE while you were suggested a very similar approach by @MadeInGermany in this thread here?

Please don't cross post the same questions on multiple forums while you were given a very reasonable suggestion here. This type of behavior is highly discouraged and frown upon.

Please be warned!

MadeInGermany · May 25, 2021, 2:50pm

Maybe my fault - I had a bug that now is corrected.
I let split() use FS and a string with space separators, so FS is to be set to comma after it.
The alternative would be to first set FS to comma, and then split a string with commas.

vgersh99 · May 25, 2021, 2:59pm

@MadeInGermany, maybe, but the approach was exactly what was suggested in SE.
The should ahve been noted/fixed here - me thinks.