RJG
June 7, 2016, 6:20am
1
I have a fixed length file a.txt that looks like
a@ a00 a00000 a00 a000000 a00 a0000 a0000 a00000000 a01
a@ a1 a2 a11 a22 a12 a13 a44 a45 a54 a65 a76 a77
a@ a1 a3 a6 a7 a9 a8 a2 a7 a8 a8 a9 a0
b@ b00 b00000 b00 b000000 b00 b0000 b0000 b00000000 b01
b@ b1 b2 b11 b22 b12 b13 b44 b45 b54 b65 b76 b77
b@ b1 b3 b6 b7 b9 b8 b2 b7 b8 b8 b9 b0
c@ c00 c00000 c00 c000000 c00 c0000 c0000 c00000000 c01
c@ c1 c2 c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
d@ d00 d00000 d00 d000000 d00 d0000 d0000 d00000000 d01
d@ d1 d2 d11 d22 d12 d13 d44 d45 d54 d65 d76 d77
d@ d1 d3 d6 d7 d9 d8 d2 d7 d8 d8 d9 d0
In this file , 3 consecutive lines look same for a@,b@,c@ d@
If in the 2nd column of 2nd row of every such segment contains (, then I have to append 'new' after the 5th column
of the 1st line of that segment such that the next word of the appended word stayed in the previous position.
Like if the c segment looks like [2nd column of 2nd row contains ')' ]
c@ c00 c00000 c00 c000000 c00 c0000 c0000 c00000000 c01
c@ c1 c( c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
Then 5th column of the 1st line of that segment should look like
c@ c00 c00000 c00 c000000new c00 c0000 c0000 c00000000 c01
So new file should look like
a@ a00 a00000 a00 a000000 a00 a0000 a0000 a00000000 a01
a@ a1 a2 a11 a22 a12 a13 a44 a45 a54 a65 a76 a77
a@ a1 a3 a6 a7 a9 a8 a2 a7 a8 a8 a9 a0
b@ b00 b00000 b00 b000000 b00 b0000 b0000 b00000000 b01
b@ b1 b2 b11 b22 b12 b13 b44 b45 b54 b65 b76 b77
b@ b1 b3 b6 b7 b9 b8 b2 b7 b8 b8 b9 b0
c@ c00 c00000 c00 c000000new c00 c0000 c0000 c00000000 c01
c@ c1 c( c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
d@ d00 d00000 d00 d000000 d00 d0000 d0000 d00000000 d01
d@ d1 d2 d11 d22 d12 d13 d44 d45 d54 d65 d76 d77
d@ d1 d3 d6 d7 d9 d8 d2 d7 d8 d8 d9 d0
Please let me know if I can do the change by Unix bash shell scripting
Assumes file can fit in memory...
A less memory intensive cersion is alos possible reading the file in blocks of 3 lines
$ cat tmp/tmp.pl
#!/usr/bin/perl
use strict;
use warnings;
open (my $test,'<','tmp/tmp.dat');
my @lines=readline($test);
for (my $line=0;$line<$#lines;$line+=3){
my @second=split/ /,$lines[$line + 1];
if ($second[2]=~/\(/){
my @first=split/ /,$lines[$line];
$first[4].='new' ;
$lines[$line]=join(" ",@first);
}
for my $inc (0..2){
print $lines[$line + $inc]
}
}
$ cat tmp/tmp.dat
a@ a00 a00000 a00 a000000 a00 a0000 a0000 a00000000 a01
a@ a1 a2 a11 a22 a12 a13 a44 a45 a54 a65 a76 a77
a@ a1 a3 a6 a7 a9 a8 a2 a7 a8 a8 a9 a0
b@ b00 b00000 b00 b000000 b00 b0000 b0000 b00000000 b01
b@ b1 b2 b11 b22 b12 b13 b44 b45 b54 b65 b76 b77
b@ b1 b3 b6 b7 b9 b8 b2 b7 b8 b8 b9 b0
c@ c00 c00000 c00 c000000 c00 c0000 c0000 c00000000 c01
c@ c1 c( c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
d@ d00 d00000 d00 d000000 d00 d0000 d0000 d00000000 d01
d@ d1 d2 d11 d22 d12 d13 d44 d45 d54 d65 d76 d77
d@ d1 d3 d6 d7 d9 d8 d2 d7 d8 d8 d9 d0
$ perl tmp/tmp.pl
a@ a00 a00000 a00 a000000 a00 a0000 a0000 a00000000 a01
a@ a1 a2 a11 a22 a12 a13 a44 a45 a54 a65 a76 a77
a@ a1 a3 a6 a7 a9 a8 a2 a7 a8 a8 a9 a0
b@ b00 b00000 b00 b000000 b00 b0000 b0000 b00000000 b01
b@ b1 b2 b11 b22 b12 b13 b44 b45 b54 b65 b76 b77
b@ b1 b3 b6 b7 b9 b8 b2 b7 b8 b8 b9 b0
c@ c00 c00000 c00 c000000new c00 c0000 c0000 c00000000 c01
c@ c1 c( c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
d@ d00 d00000 d00 d000000 d00 d0000 d0000 d00000000 d01
d@ d1 d2 d11 d22 d12 d13 d44 d45 d54 d65 d76 d77
d@ d1 d3 d6 d7 d9 d8 d2 d7 d8 d8 d9 d0
RJG
June 7, 2016, 7:32am
3
Thank you for your reply .
In the above code, newly appended word 'c000000new' shifts the next word 'c00' from it's position.
But the next word should not move from it's previous position
Can I use the same logic using bash script.I have to use bash in my coding
RudiC
June 7, 2016, 9:05am
4
Why do you "have to use bash in ... (your) coding"? Who says so?
RJG
June 7, 2016, 9:17am
5
Sorry I dont know perl..so
Thanks
Similar logic in awk
$ awk -F\ '
{count[$1]++;
if ((count[$1]==2) && ($3~/\(/)){
record[$1]=gensub(/([^ ]+ +[^ ]+ +[^ ]+ +[^ ]+)(.+)/,"\\1new\\2",1,record[$1]);
}
record[$1]=record[$1]"\n"$0;
}
END{for (id in record){print record[id];}}' tmp/tmp.dat
a@ a00 a00000 a00 a000000 a00 a0000 a0000 a00000000 a01
a@ a1 a2 a11 a22 a12 a13 a44 a45 a54 a65 a76 a77
a@ a1 a3 a6 a7 a9 a8 a2 a7 a8 a8 a9 a0
b@ b00 b00000 b00 b000000 b00 b0000 b0000 b00000000 b01
b@ b1 b2 b11 b22 b12 b13 b44 b45 b54 b65 b76 b77
b@ b1 b3 b6 b7 b9 b8 b2 b7 b8 b8 b9 b0
c@ c00 c00000 c00new c000000 c00 c0000 c0000 c00000000 c01
c@ c1 c( c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
d@ d00 d00000 d00 d000000 d00 d0000 d0000 d00000000 d01
d@ d1 d2 d11 d22 d12 d13 d44 d45 d54 d65 d76 d77
d@ d1 d3 d6 d7 d9 d8 d2 d7 d8 d8 d9 d0
RudiC
June 7, 2016, 11:50am
7
To keep the formatting of the original file, try
awk 'NR%3 {T = $0; P = $5; getline; if (index ($3, ")")) sub (P " ", P "new", T); print T;}1' file
a@ a00 a00000 a00 a000000 a00 a0000 a0000 a00000000 a01
a@ a1 a2 a11 a22 a12 a13 a44 a45 a54 a65 a76 a77
a@ a1 a3 a6 a7 a9 a8 a2 a7 a8 a8 a9 a0
b@ b00 b00000 b00 b000000 b00 b0000 b0000 b00000000 b01
b@ b1 b2 b11 b22 b12 b13 b44 b45 b54 b65 b76 b77
b@ b1 b3 b6 b7 b9 b8 b2 b7 b8 b8 b9 b0
c@ c00 c00000 c00 c000000new c00 c0000 c0000 c00000000 c01
c@ c1 c) c11 c22 c12 c13 c44 c45 c54 c65 c76 c77
c@ c1 c3 c6 c7 c9 c8 c2 c7 c8 c8 c9 c0
d@ d00 d00000 d00 d000000 d00 d0000 d0000 d00000000 d01
d@ d1 d2 d11 d22 d12 d13 d44 d45 d54 d65 d76 d77
d@ d1 d3 d6 d7 d9 d8 d2 d7 d8 d8 d9 d0
#!/bin/bash
count=1; tarray1=(); tarray2=()
while IFS= read line
do
if [[ "$count" = "1" ]]; then
((count++)); oline1="$line"; tarray1=($oline1)
elif [[ "$count" = "2" ]]; then
((count++)); oline2="$line"; tarray2=($oline2)
if [[ "${tarray2[2]}" =~ "(" ]]; then
oline1=`echo "$oline1" | sed 's/'${tarray1[4]}' /'${tarray1[4]}'new/'`
fi
printf "%s\n%s\n" "$oline1" "$oline2"
else
echo "$line"
count=1
fi
done <inputfile