How to manipulate string in line?

baris35 · October 1, 2018, 4:52pm

Hello,
I looked up on google but do not know from which point to start...
I am under ubuntu 18 bionic and Mainfile consists of 25K lines

MainFile:

Test,AAEE9FED3, GGBBDD DD AA X2d Moscow
112233445566aaBBccPPdddEE
Test,AAEE9FED3, GG33DD s00022 Leningrad
11298932566aaBBccPPdddEE
Test,AAEE9FED3, 33VVDD sdsds333 Belgorod
11090aBBccPPdSDSDEw00
Test,AAEE9FED3, QQTT11 00DD2 Astrakhan
112233445566aaBBccPPdddEE
Test,SDFEE3D3, SDPL31 00DD2 Buryatiya
112233445566aaBBccPPdddEE
..
..
..

ComparisonFile:

Moscow
Leningrad
Astrakhan

I wish to convert the MainFile into below format:

Test,Moscow: AAEE9FED3, GGBBDD DD AA X2d
112233445566aaBBccPPdddEE
Test,Leningrad: AAEE9FED3, GG33DD s00022
11298932566aaBBccPPdddEE
Test,Astrakhan: AAEE9FED3, QQTT11 00DD2 
112233445566aaBBccPPdddEE

I thought that the algorithm should be like this:

1) read the line in the comparison file,
2) search space_$line in MainFile
3) if it returns, cut space_$line in MainFile ...just in case "$line" is coming after the last space in MainFile
4) put $line right after Test, phrase

A bit complicated.

I'd appreciate your help

Many thanks
Boris

vgersh99 · October 1, 2018, 5:01pm

any idea how to do it purely in awk?

baris35 · October 1, 2018, 5:05pm

Dear Vgersh99,
My idea:
while read comparison file, grep each matching line > create a new file but then when I am gonna paste two files, it will fail I think. I do not like awk as I do not understand and unable to edit in my future needs. I suppose I need to learn how stuff works with awk command.

Please do not reply promptly. I am trying to learn awk command. Just let the baby crowl on the ground for 24h. Many thanks

Kind regards
Boris

vgersh99 · October 1, 2018, 5:29pm

Sure thing.
Here's my idea with awk:

read your ComparisonFile into an array indexed by $0 (hint: FNR==NR)
for each odd line in MainFile, substituting first , by itself appended with the last field on a line followed by : . Set a flag
If flag is set and you're on an even line, output the line and reset the flag.

rdrtx1 · October 1, 2018, 5:45pm

start with:

while read line
do
   last_word=${line##* }
   word=0
   grep -q -i "^$last_word$" ComparisonFile && word=1
   if [ $word = 1 ]
   then
      line=${line/Test,/Test,$last_word: }
      echo "${line% *}"
   else
      echo "${line}"
   fi
done < MainFile

vgersh99 · October 1, 2018, 5:54pm

I get this - not exactly what the OP was after....

Test,Moscow: AAEE9FED3, GGBBDD DD AA X2d Moscow
112233445566aaBBccPPdddEE1
Test,Leningrad: AAEE9FED3, GG33DD s00022 Leningrad
11298932566aaBBccPPdddEE2
Test,AAEE9FED3, 33VVDD sdsds333 Belgorod
11090aBBccPPdSDSDEw00
Test,Astrakhan: AAEE9FED3, QQTT11 00DD2 Astrakhan
112233445566aaBBccPPdddEE3
Test,SDFEE3D3, SDPL31 00DD2 Buryatiya
112233445566aaBBccPPdddEE

RudiC · October 1, 2018, 7:04pm

@rdrtx1: still doesn't seem to suppress the records NOT in ComparisonFile, after the revision. And, it grep s the ComparisonFIle 25k times ... might become lengthy.

baris35 · October 2, 2018, 4:50pm

Hello Vgersh99,

Regarding your remark 1, I did not understand what happens when we read the ComparisonFile into array but here you are:

awk 'NR==FNR{a[$0];}{if($0 in a)print $0}' ComparisonFile > ReadArray

Do not understand how to use ReadArray file in this case.

Now, read the last column in Mainfile and if it's the same with $1 (line in ComparisonFile), replace , by ,$1: :

While read line_ComparisonFile && read -r line_MainFile <&3; do
L=$(awk '{print $NF}' $line_MainFile)
awk '$line_ReadArray==$L '{gsub(/,/,"$line_ReadArray:")}' {f=1} f'
done < ComparisonFile 3<MainFile

This is what I learnt so far.

PS: As I could not append even/odd function into it, I do not know how flag will understand when it will be turned on/off.

awk 'NR%2==0' MainFile > even
awk 'NR%2==1' MainFile > odd

Many thanks
Boris

vgersh99 · October 2, 2018, 5:04pm

here's the code demonstrating the algorithm described in my previous comment - line by line:

awk '
   FNR==NR{f2[$1];next} 
   FNR%2 { if ($NF in f2) {sub(",","&" $NF ": ",$1);$NF=""; f=1; print;next} }
   f {print $0; f--}' ComparisonFile MainFile

this can be further "simplified":

awk '
  FNR==NR{ f2[$1];next }
  FNR%2 { if ($NF in f2) {sub(",","&" $NF ": ",$1);$NF=""; f=1; print;next} } 
  f&&f--' ComparisonFile MainFile