awk script: print line number n of another file

kpg · February 10, 2010, 10:26am

Hi,

I wrote an awk script to analyse file A.
I call the script with files A and B. File A has lines like:

000000033100001
000000036100001
000000039100001

The first 9 characters are interpreted as a line number;
for each line number found I want to output this line number of file B.

searching around my idea was to use sed to output the line number, and system call sed inside awk...but the script below does not work, the call to sed fails. maybe just because the line_nr variable isnt correctley passed.
Can someone help me how to correctely write the call to sed or solve the problem inside awk?

regards
kp

awk 'BEGIN {
    printf("line %5s result code %12s record_id\n","","")}
{
    line_nr=substr($1,0,9);
    res_code=substr($1,10,3);
    rec_id=substr($1,13,3);
    printf("%09i  ", line_nr);
    printf("%03i %-20s ", res_code, rescode[res_code+0]);
    printf("%03i %-40s\n", rec_id, recid[rec_id+0]); # +0 needed to convert string rec_id to number
    system("sed -n \"line_nr p\" $2");
}' $1

---------- Post updated at 10:26 AM ---------- Previous update was at 10:15 AM ----------

[/COLOR]ok, shame on me for finding this too late. the passing of variables to the system call is done via sysstring; the second file i passed via "-v sourcefile=$2" to awk:

 sysstring=sprintf("sed -n \"%i p\" %s", line_nr, sourcefile);
 system(sysstring);

so, the modified awk script works. the question remains: possible without system call?

ahmad.diab · February 10, 2010, 10:33am

it is more simpler if you use associative array below:-

Note:- in solaris use nawk instead of awk.

awk 'NR==FNR{a[ int( substr($0,1,9) )] ; next}(FNR in a)' fileA fileB

BR

kpg · February 16, 2010, 8:28am

Thank you! This is indeed very cool awk - I am still struggling how it works, how to use it and add additional code.

How is the performance of this code? My files have about 2,5 million lines and about 2 GB size.
Thank you

ahmad.diab · February 16, 2010, 8:40am

will handle any size of lines without any problem test it. I think awk only have limitation on number of columns depend on your system -not sure about this info.-

BR

kpg · February 19, 2010, 8:47am

ahmad.diab, I am sorry, but I do not understand how this code works. I need to modify it

awk 'BEGIN { ...set some variables
}

{ # main
# set variables based on substrings of $1, e.g 
line_nr=int(substr($1,0,9));

# print something
# and then want awk to print the "matching" line in fileB
}'

I tried a lot of integrating this with your code, but ended with syntax errors.
My approach with system calls works, but yours is performing a lot better,
Any help appreciated.

with no luck.

ahmad.diab · March 10, 2010, 6:56am

kpg:

ahmad.diab, I am sorry, but I do not understand how this code works. I need to modify it
awk 'BEGIN { ...set some variables
}

{ # main
# set variables based on substrings of $1, e.g 
line_nr=int(substr($1,0,9));

# print something
# and then want awk to print the "matching" line in fileB
}'
I tried a lot of integrating this with your code, but ended with syntax errors.
My approach with system calls works, but yours is performing a lot better,
Any help appreciated.

with no luck.

may be you mean below:-
first of all the first index of substr is starting from 1 not 0.I think this was your mistake...after your modification find below:-

gawk 'BEGIN{} NR==FNR{line_num=int( substr($0,1,9) ) ; a[line_num] ; next}(FNR in a)' fileA fileB

you can use nawk,/usr/xpg4/bin/awk on solaris or gawk on other systems.

;);)

kpg · March 10, 2010, 8:24am

Thank you for your reply. I'll try to explain what I want to do:

fileA is:
000000011111000
000000329100001
000000330100001

the contents indicate line numbers (position 0 to 8) to look for in fileB
fileB is:
dataline1
dataline2
...
dataline32
dataline33
...

Now the output of my awk should be for each line in fileA

prettyfied line from fileA. eg print "error on line" line_num
prettyfied matching line (dataline line_num in fileb)

I just do not understand how to modify your code to achieve this.

1 gawk '
2 BEGIN{} 
3 NR==FNR{
4   line_num=int( substr($0,1,9) ) ; 
5   a[line_num] ;
6   next}
7  (FNR in a)
8 ' fileA fileB

at line 2 i can insert my initialization stuff.
#3. builds the array?
#4 sets the line_num variable;
#5 no idea what this does
#6 a search said "next means nothing else is done with this line from the first file"
#7 no idea

Maybe you can answer my questions,please?
Is it possible to let awk print a mixed output from fileA and matching lines in fileB

Thank you very much!

ahmad.diab · March 10, 2010, 8:33am

what is your desired o/p? what do you want the code to present to you on screen?

ex:- error on line# 12 dataline12.

kpg · March 10, 2010, 10:54am

exactly as you wrote.

a sample output of my old awk script (calling sed via system, which is sloow) you see below.

The output consists of 2*n lines.

odd lines are from fileA. Each odd ouput line has three columns; these are calculated from contents in a line (e.g. substring, array access). so the first column is line_num as int( substr($0,1,9) )
even lines are indented with " -> " and show the line number line_num from fileb.

errors on lines
line       result code              record_id
000000011  111 DEC Format Error     000 Create account
 -> 00012345678D42104310|L 5912345678|L|DEFAULT 12345678D42104310 ACT 20100306031545 00000000000000000000000010020100306031545 +0000000000000000020100306031545978
000000329  100 DEC Internal Error   001 updateAccount
 -> 00012345678D5406073820|L 5912345678|L|DEFAULT 12345678D5406073820 ACT 20100306031555 00000000000000000000010000020100306031555 +0000000000000000020100306031555978

Thank you!

ahmad.diab · March 10, 2010, 11:10am

test below:-

gawk 'BEGIN{} NR==FNR{line_num=int( substr($0,1,9) ) ; a[line_num] ; next}(FNR in a){printf  "Error on line number %09s  %s\n",FNR,$0}' fileA fileB

;);)

kpg · March 12, 2010, 9:00am

Hi,

thank you! You helped me so much.:rolleyes: This code works as wanted.

But when I tested and added some line breaks here and there to improve readability, the script didnt stop when fileA was read.

Question:
Why do line breaks make a difference?
Where exactely is the content of fileB printed, i.e where is the print statement? And can it be made explicit and modified.

ahmad.diab · March 12, 2010, 10:22am

line break do not make a difference when writing in correct format..
what did you do on the code? post it and I will see where is your problem.

kpg · April 22, 2010, 10:06am

hi ahmad.diab,

as you wrote, "line break do not make a difference when writing in correct format". Not wasting too much time I escaped _all_ the linebreaks I've added for readability in "my" awk script with a backslash. This works.

Thank you very much for your help and sorry for answering that late. you helped me a lot.

ahmad.diab · April 22, 2010, 12:35pm

you are welcome. :D:D:D