awk command to display particular pattern

Hi
I am using awk to print 10,11 column but it is not displaying required output.
Please let me know how I can browse through the line and extract the required one

Example: I have below 2 lines in file

seq 49960| Thu Apr 19 10:57:40.726182 2018|Len: 89|GAP for CL/U18 9P-NC (CL90U8)) gap cnt=7337.  msg csn=8815 (16 bit), pf csn=1477 (16 bit)||

seq 49961| Thu Apr 19 10:57:40.727959 2018|Len: 93|DUP seq for CL/X18 8.5P-NC (CL85W8)) n_msgs=2150.  msg csn=863 (16 bit), pf csn=3012 (16 bit)||

I use

cat /tmp/OutFile | egrep "GAP" | awk '{print $10, $11}' | sort| uniq >> ${FILE}

cat /tmp/OutFile | egrep "DUP" | awk '{print $11, $12}' | sort| uniq >> ${FILE}

I want after awk
I should get in

1st column                      2nd column
CL/U18 9P-NC                 CL90U8
CL/X18 8.5P-NC              CL85W8

This is difficult if not impossible, as you have spaces as field separators but also contained in desired output values. Do you see a chance to define fields and separators differently?

1 Like

Well, this might work, taking advantage of the data structure as shown:

awk  '
match ($0, /(GAP|DUP)[^)]*\)\)/)        {TMP1 = substr ($0, RSTART, RLENGTH-2)
                                         n = split (TMP1, T)
                                         TMP2 = sprintf ("%s %s\t%s", T[n-2], T[n-1], substr (T[n], 2))
                                         if (!CNT[TMP2]++) print TMP2
                                        }
' file
CL/U18 9P-NC	CL90U8
CL/X18 8.5P-NC	CL85W8
1 Like

I tried below, but not able to remove one "(" :stuck_out_tongue:

cat OutFile | grep GAP | awk -F'|' '{print $4}' |awk -F')' '{print $1}' |awk -F' ' '{print $3,$4,$5}'

Output

CL/U18 16P-NC (CL160U8
CL/N18 5.5P-NC (CL55S8
CL/Q18 9.5P-NC (CL95T8
CL/Q18 6.5P-NC (CL65T8
CL/M18 5.5P-NC (CL55R8

---------- Post updated at 11:13 AM ---------- Previous update was at 11:07 AM ----------

I am getting the below error

awk  '
>> match ($0, /(GAP|DUP)[^)]*\)\)/)        {TMP1 = substr ($0, RSTART, RLENGTH-2)
>>                                          n = split (TMP1, T)
>>                                          TMP2 = sprintf ("%s %s\t%s", T[n-2], T[n-1], substr (T[n], 2))
>>                                          if (!CNT[TMP2]++) print TMP2
>>                                         }
>> ' FILE
awk: syntax error near line 2
awk: bailing out near line 2
1 Like

Thanks RudiC, that worked with nawk .

The two commands:

cat /tmp/OutFile | egrep "GAP" | awk '{print $10, $11}' | sort| uniq >> ${FILE}

cat /tmp/OutFile | egrep "DUP" | awk '{print $11, $12}' | sort| uniq >> ${FILE}

clearly can't produce the desired output since you are printing two whitespace-separated input fields when your output contains three whitespace-separated fields. Furthermore, you have two unneeded invocations of cat , egrep , and uniq ; one unneeded invocation of awk and one or two unneeded invocations of sort .

The following is an alternative to the code RudiC suggested that seems to do what you want with one invocation of awk and one invocation of sort :

awk '
/DUP/ { gsub(/[()]/, "", $13)
        out[$11 " " $12 "\t" $13]
}
/GAP/ { gsub(/[()]/, "", $12)
        out[$10 " " $11 "\t" $12]
}
END {   printf("1st column\t2nd column\n")
        for(line in out) 
                print line | "sort"
}' file

which (if file contains one or more copies of the three sample input lines you showed us in post #1) produces the output:

1st column	2nd column
CL/U18 9P-NC	CL90U8
CL/X18 8.5P-NC	CL85W8

which contains a space separating the two input fields that make up the "1st column" output and a tab separating that from the single output field that makes up the "2nd column" output.

If you don't care about the order of lines in the output (and were just using sort | uniq to get rid of duplicate lines of output instead of caring about the order of the output, delete the text in the script shown in red and it will run a little bit faster.

If you are trying this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk or nawk .

It isn't clear to me whether you actually want those headings or not, but I included them since that is what you said you wanted. I assume that it is obvious that you can remove the printf statement from my script above if you don't want the heading line in your output.

Thanks, Don for the detailed explanation.
I am getting the following error

awk: syntax error near line 2
awk: illegal statement near line 2
awk: syntax error near line 5
awk: illegal statement near line 5

---------- Post updated at 12:26 PM ---------- Previous update was at 12:22 PM ----------

Hi RudiC,
I see the command stopped producing the desired output.

File has

seq 751287| Thu May 03 15:21:57.526253 2018|Len: 77|GAP Ric IOVVA.Z Flag2 14 Sym IOBVA-I msg SEQ_NO:2628 db:2626 diff 2 process msg||
seq 751288| Thu May 03 15:21:57.542730 2018|Len: 77|GAP Ric IOGNS.B Flag2 17 Sym IOGNS-J msg SEQ_NO:1375 db:1373 diff 2 process msg||
seq 751289| Thu May 03 15:21:57.551653 2018|Len: 78|GAP Ric IMRGN.OQ Flag2 12 Sym IMGGN-O msg SEQ_NO:2845 db:2843 diff 2 process msg||

But when I execute below nawk , it returns nothing.

System:xxxx% nawk  'match ($0, /(GAP)[^)]*\)\)/) {TMP1 = substr ($0, RSTART, RLENGTH-2); n = split (TMP1, T); TMP2 = sprintf ("%s %s\t\t%s", T[n-2], T[n-1], substr (T[n], 2)); if (!CNT[TMP2]++) print TMP2};' /tmp/File
System:xxxx%

I tired removing pattern to find "))" but again returns blank

 nawk  'match ($0, /(GAP)*/) {TMP1 = substr ($0, RSTART, RLENGTH-2); n = split (TMP1, T); TMP2 = sprintf ("%s %s\t\t%s", T[n-2], T[n-1], substr (T[n], 2)); if (!CNT[TMP2]++) print TMP2};' /tmp/File

See post#2.

What operating system are you using?

Exactly what command did you run that produced the diagnostics:

awk: syntax error near line 2
awk: illegal statement near line 2
awk: syntax error near line 5
awk: illegal statement near line 5

???

1 Like

Thank you Don, i need to use nawk