Manipulating two files

rinku11 · December 5, 2006, 8:55am

Hi Friends,
I prefer to represent my problem with example.
I have two files as below:

file1.txt
---------
abcd.....1234......XY
abcd.....1235......XX
abcd.................
abcd...231236..1111YX
abcd...241236..1112YY
abcd...241237......YY
abce.....1235......YY

file2.txt
-------

1235,9998
1237,9999

I need the final file to be like below:

final_f.txt
---------
abcd.....1234......XY
abcd.....1235..9998XX
abcd.................
abcd...231236..1111YX
abcd...241236..1112YY
abcd...241237..9999YY
abce.....1235..9998YY

NOTE: dot(.) indicates one space
To be precise, I have a file(file1.txt) of fixed length and no field separator. Each record is of fixed length (21). I have a second file(file2.txt) which is having 2 fields only which are comma (,) separated. I need to parse through the first file. Whenever I will get a value from char 10-13, I will search in file2's first field. If I get the value, will take the second filed from the file2.txt and will replace the 16-19 charecters of the first file with the second field of second file. Point to note that the final fie is of (21) same length of the first file.

Hope this can be done through AWK. But as am new to it, any help is appreciated. I need this very immediately.
Thanks in advance,

Regards,
rin**

Perderabo · December 9, 2006, 7:47am

This ksh script seems to work with your posted data:

#! /usr/bin/ksh

exec < file1.txt
integer saved limit
saved=0
limit=1000

#
#  read a line and break it into fields
while IFS="" read line ; do
        tmp=${line#?????????}
        field1=${line%$tmp}
        line="$tmp"
        tmp=${line#????}
        field2=${line%$tmp}
        line="$tmp"
        tmp=${line#??}
        field3=${line%$tmp}
        line="$tmp"
        tmp=${line#????}
        field4=${line%$tmp}
        field5="$tmp"

#
#  If field2 is numeric we will use it to search for a new field4

        if [[ $field2 == +([0-9]) ]] ; then

#
#  See if we previously saved the data for this field2

                eval data=\${XX${field2}:-NOT_THERE}
                if [[ $data != NOT_THERE ]] ; then
                        field4="$data"
                else

#
#  See if we can find field2 in the second file

                        if data=$(grep "^${field2}," file2.txt) ; then
                                data=${data##*,}
                                echo found data = $data
                                field4="$data"

#
#  Save the first $limit records we find in memory to avoid re-examining the file each time

                                if ((saved<limit)) ; then
                                        ((saved=saved+1))
                                        eval XX${field2}=\${data}
                                fi
                        fi
                fi
        fi
        echo "${field1}${field2}${field3}${field4}${field5}"
done
exit 0

I don't believe that I have ever used a statement like:
if data=$(grep "^${field2}," file2.txt) ; then
before. It is a cool technique.

ghostdog74 · December 9, 2006, 10:30am

Python alternative, if you have Python installed:

#!/usr/bin/python
f2data = {} #store as look up table
for f2 in open("file2.txt"):
	fone,ftwo = f2.strip().split(",") # get 1235, 9998 etc
	f2data[fone] = ftwo
for f1 in open("file1.txt"):				
		if f1[9:13] in f2data:			
			print f1[0:14] + f2data.get(f1[9:13]) + f1[19:].strip()
		else:
			print f1.strip()

output:

#/home/test> python test.py
abcd.....1234......XY
abcd.....1235.9998XX
abcd.................
abcd...231236..1111YX
abcd...241236..1112YY
abcd...241237.9999YY
abce.....1235.9998YY

Ygor · December 10, 2006, 5:28am

Or...

awk 'BEGIN{
             FS = ","
             while (getline < "file2" > 0)
                     arr[$1] = $2
     }
     {
             key = substr($0, 10, 4)
             if (key in arr)
                     print substr($0, 1, 15) arr[key] substr($0, 20, 2)
             else
                     print $0
     }
     ' file1

marlonus999 · December 11, 2006, 2:47am

You're awesome guys...