Extract file records based on some field conditions

Hello Friends,

I have a file(InputFile.csv) with the following columns(the columns are pipe-delimited):
ColA|ColB|ColC|ColD|ColE|ColF

Now for this file, I have to get those records which fulfil the following condition:
If "ColB" is NOT NULL and "ColD" has values one of the following ('A','B','C') the show the records

If the aforementioned condition is satisfied then I need to have those records only in a separate file.

Please help me in this! Thanks in Advance!!

with warm regards


  1. ABC ↩ī¸Ž

1 Like

Hello kevintse,

Sorry I failed to mentioned that the values I have to check are not 'A','B','C' but string values like 'Asia','Casablanca','Brando'

So the condition should be:
If "ColB" is NOT NULL and "ColD" has values one of the following ('Asia','Casablanca','Brando') the show the records

Sorry for the goof-up.
In this case how will teh awk command be..

Thanks in Advance!!

awk -F"|" 'BEGIN { B["Asia"]=B["Casablanca"]=B["Brando"]=1; }  $2 && ($4 in B) ' InputFile.csv > outfile

If you have many more than 3 criteria (our you may change them) you should consider loading the criteria from another file:

awk -F"|" 'NR==FNR { B[$1]=1; Next }  $2 && ($4 in B) ' CritFile.txt InputFile.csv > outfile
1 Like

Hi,

When I tried the command I received the following error:

$ cat InputFile.csv
ColA|ColB|ColC|ColD|ColE|ColF
1|A|B|Asia|E|F
1||C|Asia|E|F
2|B|C|Casablanca|E|F
3|B|C|Brando|E|F

$ awk -F"|" 'BEGIN { B["Asia"]=B["Casablanca"]=B["Brando"]=1; } $2 && ($4 in B) ' InputFile.csv > outfile
awk: syntax error near line 1
awk: bailing out near line 1

Please help me in this. Thanks in Advance!!

nawk -F\| '$2&&$4~/^(Brando|Casablanca|Asia)$/' InputFile.csv > outfile
1 Like
x=($(sed '=' infile | sed -n 'N;s/\n/ /;p'|sed -n 's/[^ ]|*[^|]*|*[^ ]|*\(Asia\|Casablanca\|Brando\).*/\1/p'|sed 's/[^0-9]\|//g;/^$/d') )
for i in ${x[@]} ; do sed -n "$i p" infile ; done
1|A|B|Asia|E|F
2|B|C|Casablanca|E|F
3|B|C|Brando|E|F

I knew you try to short the code, but there is a bug in it, if $2=0, it is not null :slight_smile:

$ cat infile
ColA|0|ColC|Asia|ColE|ColF

$ nawk -F\| '$2&&$4~/^(Brando|Casablanca|Asia)$/' infile

1 Like

Well spotted.. Thanks

nawk -F\| '$2""&&$4~/^(Brando|Casablanca|Asia)$/' InputFile.csv > outfile

The original requirement was for 2nd field is non-null, lol