Looking into file !

rahul303 · October 3, 2007, 1:31pm

Hi All,

I am in a situation wherein I have one look-up file or say mapping file which has two columns of data , suppose:

LookupFile.txt:
col1 col2
6589 7879
8787 0909
4343 4576

Now in the file which has to be processed has following format:

to_be_processed_file.txt
col1 col2 col3
676787989800996589 65656576687878
788757657675768787 88787878756446
323247656876984343 42341242542345

Now suppose in row 1 I need to append 6589 with respective value from mapping file i.e 7879. Similarly 8787 should be replaced as 0909.
The new value should fill the space in the col2:
So after processing it should look like this:

col1 col2 col3
6767879898009965897879 65656576687878
7887576576757687870909 88787878756446
3232476568769843434576 42341242542345

Kindly Help !

vgersh99 · October 3, 2007, 1:47pm

are all the value in column1 of 'LookupFile.txt' of the same 4-character length OR can they all be of different length?

vgersh99 · October 3, 2007, 2:02pm

given a 4-character lookup key in the 'LookupFile.txt' file...
nawk -f rahul.awk LookupFile.txt to_be_processed_file.txt

rahul.awk:

FNR==NR {f1[$1]=$2; next}
{
  s=substr($1, length($1)-3)
  if ( s in f1)
     $1 = $1 f1
}
1

rahul303 · October 3, 2007, 3:40pm

lookup key is of length 10 and it does not vary , will this work then ?

vgersh99 · October 3, 2007, 3:43pm

FNR==NR {f1[$1]=$2; next}
{
  s=substr($1, length($1)-9)
  if ( s in f1)
     $1 = $1 f1
}
1

rahul303 · October 3, 2007, 4:15pm

If tried getting character 28 to 37 from column:
s= substr($1,28,10)
This didn't work:
As I need to replace character 28to 37 in first column with value from mapping file.
Plz help !

vgersh99 · October 3, 2007, 4:35pm

this is different requirement with no sample input and desired output - replacing is not the same as 'suffixing' .pls provide both

drl · October 3, 2007, 5:12pm

Hi.

This script creates another script. The new script uses sed, so it is quite fast, and the entire process is data-driven by the lookup file:

#!/usr/bin/env sh

# @(#) s1       Demonstrate creation of sed script with awk.

set -o nounset
echo

## Use local command version for the commands in this demonstration.

echo "(Versions used in this script displayed with local utility "version")"
version bash awk sed

echo

echo " Input file of lookup tokens:"
cat -tv data1

./a1 data1 > script
chmod +x script

echo
echo " This is the created sed script:"
cat -n script

echo
echo " Results of running script:"
./script data2

exit 0

You notice that there is an "a1" script. It's in awk, because awk facilitates handling text fields. However, the quoting is a bit complicated:

#!/usr/bin/awk -f

# @(#) a1       Demonstrate creation of sed script from lookup file.

echo

BEGIN { print "sed \\" }
        { print "-e 's/" $1 " /" $1$2 " /' \\" }
END     { print " data2" }

When s1 runs, it then produces:

% ./s1

(Versions used in this script displayed with local utility version)
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU Awk 3.1.4
GNU sed version 4.1.2

 Input file of lookup tokens:
6589 7879
8787 0909
4343 4576

 This is the created sed script:
     1  sed \
     2  -e 's/6589 /65897879 /' \
     3  -e 's/8787 /87870909 /' \
     4  -e 's/4343 /43434576 /' \
     5   data2

 Results of running script:
6767879898009965897879 65656576687878
7887576576757687870909 88787878756446
3232476568769843434576 42341242542345

Best wishes ... cheers, drl

tomas · October 3, 2007, 6:15pm

drl:

Hi.

This script creates another script. The new script uses sed, so it is quite fast, and the entire process is data-driven by the lookup file:

#!/usr/bin/env sh

# @(#) s1       Demonstrate creation of sed script with awk.

set -o nounset
echo

## Use local command version for the commands in this demonstration.

echo "(Versions used in this script displayed with local utility "version")"
version bash awk sed

echo

echo " Input file of lookup tokens:"
cat -tv data1

./a1 data1 > script
chmod +x script

echo
echo " This is the created sed script:"
cat -n script

echo
echo " Results of running script:"
./script data2

exit 0

You notice that there is an "a1" script. It's in awk, because awk facilitates handling text fields. However, the quoting is a bit complicated:

#!/usr/bin/awk -f

# @(#) a1       Demonstrate creation of sed script from lookup file.

echo

BEGIN { print "sed \\" }
   { print "-e 's/" $1 " /" $1$2 " /' \\" }
END     { print " data2" }

When s1 runs, it then produces:

% ./s1

(Versions used in this script displayed with local utility version)
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU Awk 3.1.4
GNU sed version 4.1.2

 Input file of lookup tokens:
6589 7879
8787 0909
4343 4576

 This is the created sed script:
   1  sed \
   2  -e 's/6589 /65897879 /' \
   3  -e 's/8787 /87870909 /' \
   4  -e 's/4343 /43434576 /' \
   5   data2

 Results of running script:
6767879898009965897879 65656576687878
7887576576757687870909 88787878756446
3232476568769843434576 42341242542345

Best wishes ... cheers, drl

I haven't seen anything like this before. Very cool. I will need to play with this.

summer_cherry · October 4, 2007, 1:08am

hi,

input:

a:
6589 7879
8787 0909
4343 4576

b:
67678798980099 6589 65656576687878
78875765767576 8787 88787878756446
32324765687698 4343 42341242542345

output:

67678798980099 6589 7879 65656576687878
78875765767576 8787 0909 88787878756446
32324765687698 4343 4576 42341242542345

code:

awk '{
if (NF==2)
a[$1]=$0
else
print $1 " "a[$2] " "$3
}' a b

vgersh99 · October 4, 2007, 1:39am

summer_cherry,
this is not the expected input for file 'b' - reread the OP.

Plus, there's a better/preferable way to distiguish between 2 input file: rather than relying on different number of fields (as is the case for this particular example) - take advantage of NR and FNR values when processing 2 input files. There're plenty of examples in the previous awk-related solutions/threads.

rahul303 · October 4, 2007, 8:30am

This time file_to_be_processed has values like this:

1US146576287192498004994 0 0 0
1US144566547890498004994 0 0 0
1US123443212330498004994 0 0 0

lookup_file:
4657628719 1231231234
4456654789 6788769890
2344321233 2345678900

Now I need to replace bold values from file_to_be_processed by corresponding value in lookup_file.

Plz Help.

vgersh99 · October 4, 2007, 9:04am

BEGIN {
   _start=5
   _length=10
}
FNR==NR {f1[$1]=$2; next}
{
  s=substr($1, _start, _length)
  if ( s in f1)
     $1 = substr($1, 1, _start-1) f1 substr($1,_start+_length)
}
1

rahul303 · October 4, 2007, 9:36am

It's working perfectly fine !
Tons and Tons of thanks ....
But only 1 small issue came

I am loosing the original blank spaces in the final output :
So if in main file values are like this :
1US6786868976897[space][space][space]0[space][space]0
1US6786868976897[space][space][space]0[space][space]0
1US6786868976897[space][space][space]0[space][space]0
My out put after processing is coming like this:
1US6786868976897[space]0[space]0
1US6786868976897[space]0[space]0
1US6786868976897[space]0[space]0
So multiple space is getting truncated:
How to solve this?

Also I need to save the output in a separate file ...Kindly suggest !