Script help

penfold · March 2, 2005, 8:17am

I have two files, gmrd.txt and core_extract.txt

I need awk to look at the first column of each file and if a field appears in core_extract that also appears in gmrd.txt then it should write append that line to a file

here is what i have so far...currently it replaces the line found with "TEST"

any help appreciated

awk '
BEGIN {
FS = OFS = ";"
while (getline < "core_extract.txt" > 0)
arr[$1] = 1
}
$1 in arr {
$0 = "TEST"
}
{print}
' gmrd.txt

i'm using the K-shell on an AIX system

cheers

vgersh99 · March 2, 2005, 8:49am

awk '
  BEGIN {                                        
     FS = OFS = ";"   
  }

  FNR==NR { arr[$1]=$0; next }
  1
  $1 in arr { print arr[$1] }
  ' gmrd.txt core_extract.txt

penfold · March 2, 2005, 8:54am

thanks for the vg!

however when I run that script it actually still leaves the output from core_extract.txt in the output - is there any way of just seeing the relevant records from gmrd.txt?

again many thanks!

vgersh99 · March 2, 2005, 9:00am

from the original posting:

NOW I think you want this:

awk '
  BEGIN {                                        
     FS = OFS = ";"   
  }

  FNR==NR { arr[$1]=$0; next }
  $1 in arr { print arr[$1] }
  ' gmrd.txt core_extract.txt

penfold · March 2, 2005, 9:04am

vg thanks for the assitance with this - it works great!

just to get to grips with this - i'm assuming the 1 is a command that appends output to the specified file?

vgersh99 · March 2, 2005, 9:12am

1 - is awk's shortcut of saying "true". When awk 'condition' [logically] evaluates to true it simply prints the current record. In your case it will print record/line from core_extract.txt.

The more explicit/equivalent way to do this would be:

{print $0}

penfold · March 2, 2005, 9:33am

vg, if i could tap your knowledge one more time...

how would I modify the above script to use the 4 field in gmrd.txt and the 1st field in core_extract.txt?

vgersh99 · March 2, 2005, 9:47am

awk '
  BEGIN {                                        
     FS = OFS = ";"   
  }

  FNR==NR { arr[$4]=$0; next }
  $1 in arr { print arr[$1] }
  ' gmrd.txt core_extract.txt

penfold · March 2, 2005, 10:12am

using the code above on the following two files:

core.txt:

5069211;00;123
NOASCOMGR;03;143
US36204YA782;00;156
XS0147500028;00;132

gmrd.txt

BAY GR;03;1344;432
235649;03;2563;598
291802;00;2563;598
979217;00;2563;598
235649;03;2563;598
A0ABVB;00;2563;598
235649;03;2563;598

and the following code:

awk '
BEGIN {
FS = OFS = ";"
}

FNR==NR { arr[$2]=$0; next }
$2 in arr { print arr[$1] }
' gmrd.txt core.txt

this results in null output - even though the first file contains a '03'

is this right?

vgersh99 · March 2, 2005, 10:21am

 $2 in arr { print arr[$2] }

penfold · March 2, 2005, 10:23am

can i award vg a medal for being brainy!

thanks for all your help

penfold · March 2, 2005, 10:35am

after having tried this and based on the above two text files the output that is produced is:

A0ABVB;00;2563;598
235649;03;2563;598
A0ABVB;00;2563;598
A0ABVB;00;2563;598

what am i doing wrong? As what is should produce is the records in gmrd.txt that have the same second field as in core.txt

vgersh99 · March 2, 2005, 10:54am

seems like the SECOND field in BOTH files is NOT UNIQUE.

what gets printed out is the LAST row from gmrd for a given SECOND field.

seems like having ONLY the second field is NOT enough to identify records UNIQUELY. You might consider either using a DIFFERENT field and/or using a combination of fields from both files to do your "lookup".

penfold · March 2, 2005, 11:04am

is there anyway to overcome this as the only field common to both is the second field. would I have to create a variable out of the second fields in core.txt and then from that check gmrd.txt

vgersh99 · March 2, 2005, 11:17am

ok, how about if you give a desired output based on the sample input files you provided.

penfold · March 2, 2005, 11:26am

gmrd.txt:

;;BAY GR;03;1344;432
;;BAY GR;04;4321;221
;;235649;03;2563;598
;;235649;03;2563;345
;;291802;00;2563;598
;;979217;00;2563;598
;;235649;03;2563;598
;;A0ABVB;00;2563;598
;;235649;03;2563;598

core.txt

;;BAY GR;00;123
;;NOASCOMGR;03;143
;;US36204YA782;00;156
;;A0ABVB;00;132

The desired output would look at field three of core.txt and compare that with field three of gmrd.txt and only print those lines where gmrd.txt has the same field three value as core.txt so the output would be:

output.txt:
;;BAY GR;03;1344;432
;;BAY GR;04;4321;221
;;A0ABVB;00;2563;598

vgersh99 · March 2, 2005, 11:35am

awk '
  BEGIN {
  FS=OFS=";"
  }

  FNR==NR { arr[$3]= ($3 in arr) ? arr[$3] "\n" $0 : $0; next }
  $3 in arr { print arr[$3] }' gmrd.txt core.txt

penfold · March 2, 2005, 11:45am

this comes out with the following output:

./temp4.sh[8]: awk^JBEGIN {^JFS=OFS=";"^J}^J^JFNR==NR { arr[$3]= ($3 in arr) ? a
rr[$3] "\n" $0 : $0; next }^J$3 in arr { print arr[$3] }^Jgmrd.txt: not found

could this be to do with the flavour of UNIX im using? (AIX)

vgersh99 · March 2, 2005, 11:51am

I've only posted a relative AWK portion - the invokation sequence stayed the same.

I've reposted the entire "thing" now.

Ygor · March 2, 2005, 8:19pm

awk -F\; 'FNR==NR{arr[$3]=1;next};$3 in arr' core.txt gmrd.txt

;;BAY GR;03;1344;432
;;BAY GR;04;4321;221
;;A0ABVB;00;2563;598