Hi,
I have two files each with different record seperators, one with a pipe | and the other with a semi-colon ;
How do you deal with this in awk?
Any help appreciated
specifically i need to change the RS to ; when the following statement operates on the second file (assets.dat)
awk -F\| 'FNR==NR{arr[$3]=1;next};$30 in arr' output.txt assets.dat
Thanks for your response - sorry yes I meant the record seperator, from reading the man pages i'm aware you can change it within an awk statement - its just that I can't find examples of it. If it is possible could someone show me how i'd do it in relation to the above awk statement?
awk -F\| 'FNR==NR{arr[$3]=1;RS=";";next};$30 in arr' output.txt assets.dat
But this still will never be executed in the second file.
Is that because of the FNR statement?
What I need to do is compare two files, third field of the first file (field seperator "|") and the 2nd field of the second file (field seperator ";")
Thanks
Right. FNR is the record number in the current file; NR is the cumulative record number.
Are these one-line files? If so this is acquiring that HW smell... Also, if these are one-line files, awk is not my weapon of choice ...
This is the format of output.txt:
COEC2372323|EC2372323|7128778| |BE0117013319|381666|180617
and this is the format of assets.dat
BANKS;;7128778;;02;861542;03;B01ZJL7;;;06;EQ0010004100001000;11;IE0000197834;;;;;;;;;;;;;;;;;;;09;901773;;;;;;;;;;;;;Y;;;;EUR;EUR;;;;;;;;;;;;;;;EUR;;;;;;;;;;;;;;;;;;;;;;;S;;;;;;;;;;;;;;;;;;;;;;;;;B01ZJL7;;Y;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Both these files are hundreds of records long - possibly thousands.
The fields that need comparing are highlighted - if the third field of output.txt doesn't match any of the occurances of field three in assets.dat then we need to removed the occurance of that record in output.txt
Yikes! Okay, here's one approach. I'm assuming you're showing us one line (record) from either file.
In your BEGIN section, gather all the field 3's from the first file into an array (arr[$3]=1), since from your description you aren't interested in any other fields. At the end of your BEGIN, change FS=";". Then the second file is the one processed using all the other patterns. The pattern I think you want is "arr[$2]". If all you want to do is send all those lines to stdout, you don't even need an action, since "print the matcing line" is the default action for any pattern. So the outline goes something like this (not tested):
awk -F\| 'BEGIN { while(getline < "output.txt") arr[$3] = 1; FS = ";" } arr[$2]' assets.dat
As one-liners go, that's a little longer than I like, so I'd put it in a script file with proper indenting. As part of a sh/ksh/zsh/bash script, you can indent away between the apostrophes without creating a separate file.
Re: awk versions
Some awks are field-count limited and will do undesirable things on lines with more than 99 fields. Also, some awks don't handle arrays of thousands of things very well. Perl or Ruby (or Python? or ???) hashes might give you a performance boost. (Of course, awk arrays are really hashes anyhow, so your seven digit indexes aren't creating arrays of millions of null entries.)