Improve script and get new output file

jiam912 · June 12, 2015, 3:34pm

Gents,

Using the following script, I got the changes as desired in the output file called (spread_2611.x01.new). Complete file as input (spread_2611.x01).

Can you please have a look to my script and improve it please.

Also I would like to I get a additional selecting only the records were the changes were done, please.

Here my script and attached the data to run the script.

#!/bin/bash
             read -p "First : " fsw
             read -p "Last  : " lsw

file="datatochange.txt"

touch $file

        for swath in $(seq $fsw $lsw)
              do 
awk '{\
	ori_line=substr($5,1,5);\
	ori_point=substr($5,6,5);\
	off_line=substr($1,2,5);\
	off_point=substr($1,7,5);\
	printf ("spread_'${swath}'.x01 %5d.00   %5d.00 %5d.00   %5d.00\n",ori_line,ori_point,off_line,off_point)}' $swath"offb1-Sx.sps" >> $file

done

awk '{F=$1;a[F];e=(F~/x01$/)?"  ":z;s=$2"  "$3;p=$2"  "$3;r=$4"  "$5;
print "s/" s "/" r "/" >> F".sed"
}END{
for(i in a){print "sed -f "i".sed "i " >>"i".new">>"changeindex"}
}' datatochange.txt

sh changeindex

awk '{print $1}' datatochange.txt | uniq > datatochange1.txt 

awk '{a[$1]++ ; print $1}' datatochange1.txt | while read i
mv "$i" "$i.old" && mv "$i.new" "$i"
rm -f "$i.sed"
done

Later I will use this script to change many files at the same time.

Thanks for your help

RudiC · June 12, 2015, 6:41pm

Having a hard time to read and try to understand your snippet. What problems do you encounter that need improvement? Real errors? Performance problem?
One error I found by just looking: there's a do missing in the last while loop.

jiam912 · June 12, 2015, 7:16pm

Dear RudiC
Well It works Fine. When i say improve
I though you can modify It or get other way to do the same output more faster or efficient as your say my script is difficult to read:).

Also i am missing a output For only the records where the changes where done, i can get It with grep using input and output. But maybe there is other way

Thanks For your help

RudiC · June 12, 2015, 7:29pm

Well, may I doubt "It works Fine"? Parts of it, maybe, but still that "do" is missing, for certain leading to an error message.
Every programmer has his/her own style, and none of those is THE style, but - please - adopt some indentation habits that structure your code making it easier to read and understand.
For your clarity and by sheer courtesy to us, do the first step yourself and tidy that thing up. If correctly, structuredly, and nicely presented, optimization opportunities will lend themselves to implemention...

jiam912 · June 13, 2015, 2:23pm

Dear RudiC
Thanks For all your advises and help
I am not a programer, just i am learning. Now i am reading Pro Bash Programing to learn well bash.

I will try to optmize It.

Regates and Thanks again

---------- Post updated 06-13-15 at 01:23 PM ---------- Previous update was 06-12-15 at 09:42 PM ----------

Dear RudiC

Here my last update, hope it is more clear now.

#!/bin/bash
 
             read -p "First: " fxl

             read -p "Last : " lxl

#------------------------------------------------------------------------------------------------------
file="datatochange.txt"
touch $file
#------------------------------------------------------------------------------------------------------

        for valueNB in $(seq $fxl $lxl)
	do
        printf " |----------->> Processing value $valueNB \n"
 
	
## create Dbase ---
awk '{\
	ori_line=substr($5,1,5);\
	ori_point=substr($5,6,5);\
	off_line=substr($1,2,5);\
	off_point=substr($1,7,5);\
	printf ("'${sw_spread}' %5d.00   %5d.00 %5d.00   %5d.00\n",ori_line,ori_point,off_line,off_point)}' ${sw_offset} > $file

## create file with vps to be replaced && generate new files with changes done ----
	awk '{F=$1;a[F]?"   ":s=$2"  "$3;r=$4"  "$5;x=1;y=3;
	print "s/" s x "/" r y "/" >> F".sed"
	}END{
	for(i in a){print "sed -f "i".sed "i " > "i".new">"changefile"}
	}' $file
	sh changefile
	mv "$i" "$i.ori" && mv "$i.new" "$i"

## create file with only vps replaced && Concatenated QC files: Spread and Vps2Cancell ----	

	off_spread=$(mktemp)

	awk '{print substr($0,23,18)}' $i.sed > $off_spread
	grep -hFf $off_spread $i >> QC_spread_DB_$fxl"_"$lxl.x01
	awk 'BEGIN {OFS= ","}{print $6,"S,"substr($5,1,5),substr($5,6,5)}' $xl_guia >> QC_vps2cancell_DB_$fxl"_"$lxl.csv

## Deleting files
	rm -f "$i.sed"	*change*
	done

Kindly can you give other idea to change the code in red, to be more faster.. It works well but is very slow when I run the script for many files.

Thanks for your help

RudiC · June 13, 2015, 5:13pm

A good step into the right direction! Some comments:
Once you adopted a style, you should stay consistent (which doesn't mean you can't evolve over time). In above, in "create DBase", you have "continuation backslashes", in the other, you don't (as they obviously aren't needed). You should stick to one single syntax.
And this }END{ is absolutely horrible to read. awk uses
pattern {action}
pairs, c.f. man awk . You could reflect that in your style. Why don't you look around in here and find a neat style that you like and can adopt and adapt?

---------- Post updated at 23:13 ---------- Previous update was at 23:02 ----------

Regarding your performance issue: Having a loop running awk to produce a shell script full of sed commands which then is executed by sh smells like to be sloooow.
And - your script has changed from post#1 to post#5. Sure it's still working?

There certainly are ways to improve, but I don't think I'll be working on THAT script. Explain verbosely and exactly what you need to be done, post sample input and output files, and someone in here might come up with a good solution for you.

jiam912 · June 13, 2015, 5:25pm

Dear RudiC
Thanks For your comments, the script works Fine But slow.
I will attach examples or input and output Files, they are big so i cant post Here,
Thanks again.

Don_Cragun · June 13, 2015, 11:39pm

Don't post huge files; post small, representative sample input files and the corresponding, desired output files.

It looks like your script is creating lots of files that aren't really needed (and some of them are removed at the end of the script). Which file(s) do you really want?

jiam912 · June 14, 2015, 4:07am

Dear Don Cragun and RudiC.

Here are the sample of files i have

file1

X     0       01   54589.00  20593.001 1793 18681  54764.00  20626.00  20776.001
X     0       01   54589.00  20601.001 1877 19561  54764.00  20626.00  20784.001
X     0       01   54589.00  20609.001 1961 20441  54764.00  20626.00  20792.001
X     0       01   54589.00  20617.001 2045 21321  54764.00  20626.00  20800.001
X     0       01   54589.00  20625.001 2129 22201  54764.00  20626.00  20808.001
X     0       01   54589.00  20633.001 2213 23081  54764.00  20626.00  20816.001
X     0       01   54589.00  20641.001 2297 23961  54764.00  20626.00  20824.001
X     0       01   54589.00  20649.001 2381 24841  54764.00  20626.00  20832.001
X     0       01   54589.00  20657.001 2465 25721  54764.00  20626.00  20840.001
X     0       01   54589.00  20665.001 2549 26601  54764.00  20626.00  20848.001
X     0       01   54589.00  20673.001 2633 27481  54764.00  20626.00  20856.001
X     0       01   54589.00  20681.001 2717 28361  54764.00  20626.00  20864.001
X     0       01   54589.00  20689.001 2801 29241  54764.00  20626.00  20872.001
X     0       01   54589.00  20697.001 2885 30121  54764.00  20626.00  20880.001
X     0       01   54589.00  20705.001 2969 31001  54764.00  20626.00  20888.001
X     0       01   54589.00  20713.001 3053 31881  54764.00  20626.00  20896.001
X     0       01   54589.00  20721.001 3137 32761  54764.00  20626.00  20904.001
X     0       01   54589.00  20729.001 3221 33641  54764.00  20626.00  20912.001
X     0       01   54589.00  20737.001 3305 34521  54764.00  20626.00  20920.001
X     0       01   54589.00  20745.001 3389 35401  54764.00  20626.00  20928.001
X     0       01   54589.00  20753.001 3473 36281  54764.00  20626.00  20936.001

file2

S5458920749 483857.9   2558238.5                           8.9  5458920745 2611 000 Ex
S5458920717 483161.4   2557837.0                           8.8  5458920721 2611 000 Ex
S5458920639 481469.5   2556860.3                           6.9  5458920641 2611 000 Ex
S5458920611 480873.0   2556515.5                           1.7  5458920593 2611 000 Ex
S5458920615 480952.0   2556561.0                           1.7  5458920609 2611 000 Ex
S5458920613 480914.2   2556539.4                           1.7  5458920601 2611 000 Ex

desired output

X     0       01   54589.00  20611.003 1793 18681  54764.00  20626.00  20776.001
X     0       01   54589.00  20613.003 1877 19561  54764.00  20626.00  20784.001
X     0       01   54589.00  20615.003 1961 20441  54764.00  20626.00  20792.001
X     0       01   54589.00  20617.001 2045 21321  54764.00  20626.00  20800.001
X     0       01   54589.00  20625.001 2129 22201  54764.00  20626.00  20808.001
X     0       01   54589.00  20633.001 2213 23081  54764.00  20626.00  20816.001
X     0       01   54589.00  20639.003 2297 23961  54764.00  20626.00  20824.001
X     0       01   54589.00  20649.001 2381 24841  54764.00  20626.00  20832.001
X     0       01   54589.00  20657.001 2465 25721  54764.00  20626.00  20840.001
X     0       01   54589.00  20665.001 2549 26601  54764.00  20626.00  20848.001
X     0       01   54589.00  20673.001 2633 27481  54764.00  20626.00  20856.001
X     0       01   54589.00  20681.001 2717 28361  54764.00  20626.00  20864.001
X     0       01   54589.00  20689.001 2801 29241  54764.00  20626.00  20872.001
X     0       01   54589.00  20697.001 2885 30121  54764.00  20626.00  20880.001
X     0       01   54589.00  20705.001 2969 31001  54764.00  20626.00  20888.001
X     0       01   54589.00  20713.001 3053 31881  54764.00  20626.00  20896.001
X     0       01   54589.00  20717.003 3137 32761  54764.00  20626.00  20904.001
X     0       01   54589.00  20729.001 3221 33641  54764.00  20626.00  20912.001
X     0       01   54589.00  20737.001 3305 34521  54764.00  20626.00  20920.001
X     0       01   54589.00  20749.003 3389 35401  54764.00  20626.00  20928.001
X     0       01   54589.00  20753.001 3473 36281  54764.00  20626.00  20936.001

Using file2.txt, I need to replace the string matching in file 1,
example
file2.txt

S5458920611 480873.0   2556515.5                           1.7  5458920593 2611 000 Ex

colum 65 to 74 search for this value in file1.txt colum 20 to 24 and 30 to 34 = 5458920593, and replace with value in column 2 to 11 (file2.txt)
file1.txt

X     0       01   54589.00  20593.001 1793 18681  54764.00  20626.00  20776.001

Output file

X     0       01   54589.00  20611.003 1793 18681  54764.00  20626.00  20776.001

Also when the change is done need to replace column 38 with value 3 instead of 1.

The changes need to produce a new file keeping the original file as it is.

Appreciate your help.

RudiC · June 14, 2015, 4:58am

And how does this fit to the problem in post#1?

jiam912 · June 14, 2015, 5:11am

Dear RudiC.

This is the reason why I did this part of the script
## create Dbase ---

awk '{\
	ori_line=substr($5,1,5);\
	ori_point=substr($5,6,5);\
	off_line=substr($1,2,5);\
	off_point=substr($1,7,5);\
	printf (file1.txt %5d.00   %5d.00 %5d.00   %5d.00\n",ori_line,ori_point,off_line,off_point)}' file2.txt > $file

to get the same format of file1.txt and to match the values from file2.txt

RudiC · June 14, 2015, 7:23am

With not inconsiderable guesswork I tried to reconstruct the underlying file structure for four files:

spread_1.x01
spread_2.x01
spread_3.x01
spread_4.x01
1offb1-Sx.sps
2offb1-Sx.sps
3offb1-Sx.sps
4offb1-Sx.sps

Is that close to reality? If yes, this bash script might do what you need

#!/bin/bash
read -p "from: " fsw
read -p "to  : " lsw

awk '
FNR==1          {delete CHGARR 
                 swath=FILENAME
                 sub (/^.*_/,  "", swath)
                 sub (/\..*$/, "", swath)
                 while (1 == getline LINE < (swath"offb1-Sx.sps"))
                        {split (LINE, TMP)
                         CHGARR[TMP[5]]=TMP[1]
                        }
                }

                {IX = (int($4) int($5))}

IX in CHGARR    {sub ($4, sprintf("%8.2f",  substr(CHGARR[IX], 2, 5)))
                 sub ($5, sprintf("%8.2f3", substr(CHGARR[IX], 7, 5)))
                }

                {print $0 > FILENAME".new"}

' $(eval echo spread_{$fsw..$lsw}.x01)

Please be aware that {$fsw..$lsw} is a bash ism and does not necessarily exist in other shells.

If fsw and lsw are single digit numbers only, you could simplify the file selection to spread_[$fsw-$lsw].x01

Results: with the change file 1offb1-Sx.sps (file2)

S5458920611 480873.0   2556515.5                           1.7  5458920593 2611 000 Ex

, the line from data file spread_1.x01 (file1)

X     0       01   54589.00  20593.001 1793 18681  54764.00  20626.00  20776.001

becomes spread_1.x01.new

X     0       01   54589.00  20611.003 1793 18681  54764.00  20626.00  20776.001

Give it a shot and report back.

jiam912 · June 14, 2015, 8:13am

Dear RudiC.

You are really fantastic, and professional , the script, works perfectly and very fast .
My script was doing same process in 2 minutes , while with your script the job ends in 15 sec .
I am very grateful for your help. Thanks for your time and support.
I do not understand many things in the script , as { split ( TMP LINE } . But I will try to figure out it slowly .
If you have time, kindly could explain the procedure of the script.
Thank you very much again.

RudiC · June 14, 2015, 8:25am

man awk is your friend.

Please be aware that a detailed and precise specification with input and output samples and directory/file structure would have saved quite some time and effort.

jiam912 · June 14, 2015, 9:18am

Thanks a lot RudiC.
Is there the way to include in the script other output file merging all FILENAME".new" . . .
I am trying to do this now

Regards

---------- Post updated at 08:18 AM ---------- Previous update was at 07:38 AM ----------

I got the merged file :)... thanks :)

RudiC · June 14, 2015, 9:26am

Please share your solution here for others to benefit...

---------- Post updated at 15:26 ---------- Previous update was at 15:24 ----------

Still need the commented/explained version?

jiam912 · June 14, 2015, 9:58am

Dear Rudi C.

Maybe the way I got the file is not to good, but I got it using a loop for . cat every file and merge to single one .

Yes, please if you can, please provide some comments about the script procedures. This script is amazing Thanks.

RudiC · June 14, 2015, 2:21pm

Why don't you use - in lieu of FILENAME".new" , which will change with every input file - ONE constant file name which then would collect all output. Or just print to stdout and redirect that into your result file of choice?

---------- Post updated at 20:21 ---------- Previous update was at 19:47 ----------

Commented version of script:

awk '
FNR==1          {delete CHGARR                                          # 1. line in file: read in next change file; start with clean CHGARR
                 swath=FILENAME                                         # get sequence number from filename
                 sub (/^.*_/,  "", swath)
                 sub (/\..*$/, "", swath)
                 while (1 == getline LINE < (swath"offb1-Sx.sps"))      # read single line from .sps file,   
                        {split (LINE, TMP)                              # split into TMP array, and
                         CHGARR[TMP[5]]=TMP[1]                          # assign to CHGARR for later comparison
                        }
                }

                {IX = (int($4) int($5))}                                # create index value from 4th and 5th field

IX in CHGARR    {sub ($4, sprintf("%8.2f",  substr(CHGARR[IX], 2, 5)))  # if IX found in CHGARR, replace $4 and $5
                 sub ($5, sprintf("%8.2f3", substr(CHGARR[IX], 7, 5)))  # with resp. substr from CHGARR, FMT: 99999.99    
                }                                                       # add "3" to $5

                {print $0 > FILENAME".new"}                             # print to input filename extended by "new"
                                                                        # could be stdout!
' $(eval echo spread_{$fsw..$lsw}.x01)                                  # generate sequence of file names

jiam912 · June 14, 2015, 3:43pm

Thanks a lot for the explanation. AWK is amazing and you too.

jiam912 · June 17, 2015, 9:06am

Dear RudiC,

Sorry to ask small question.

As the the off files are located in other folder, I have try to read it but I cant

I try to had this to your script.

offsetDbDir=/temp/
#------------------------------------------------------------------------------------------------------
awk '
FNR==1          {delete CHGARR 
                 swath=FILENAME
                 sub (/^.*_/,  "", swath)
                 sub (/\..*$/, "", swath)
                 while (1 == getline LINE < (find $offsetDbDir/ -type f -iname $swath"offb1-Sx.sps"))
                       {split (LINE, TMP)
                         CHGARR[TMP[5]]=TMP[1]
                        }
                }

                {IX = (int($4) int($5))}

IX in CHGARR    {sub ($4, sprintf("%8.2f",  substr(CHGARR[IX], 2, 5)))
                 sub ($5, sprintf("%8.2f3", substr(CHGARR[IX], 7, 5)))
                }

                {print $0 > FILENAME".new"}' $(eval echo spread_{$fsw..$lsw}.x01)

I got a error division by zero attempted