Hello,
I looked up on google but do not know from which point to start...
I am under ubuntu 18 bionic and Mainfile consists of 25K lines
MainFile:
Test,AAEE9FED3, GGBBDD DD AA X2d Moscow
112233445566aaBBccPPdddEE
Test,AAEE9FED3, GG33DD s00022 Leningrad
11298932566aaBBccPPdddEE
Test,AAEE9FED3, 33VVDD sdsds333 Belgorod
11090aBBccPPdSDSDEw00
Test,AAEE9FED3, QQTT11 00DD2 Astrakhan
112233445566aaBBccPPdddEE
Test,SDFEE3D3, SDPL31 00DD2 Buryatiya
112233445566aaBBccPPdddEE
..
..
..
ComparisonFile:
Moscow
Leningrad
Astrakhan
I wish to convert the MainFile into below format:
Test,Moscow: AAEE9FED3, GGBBDD DD AA X2d
112233445566aaBBccPPdddEE
Test,Leningrad: AAEE9FED3, GG33DD s00022
11298932566aaBBccPPdddEE
Test,Astrakhan: AAEE9FED3, QQTT11 00DD2
112233445566aaBBccPPdddEE
I thought that the algorithm should be like this:
1) read
the line in the comparison file,
2) search space_$line
in MainFile
3) if it returns, cut space_$line
in MainFile ...just in case "$line" is coming after the last space in MainFile
4) put $line right after Test,
phrase
A bit complicated.
I'd appreciate your help
Many thanks
Boris
any idea how to do it purely in awk?
Dear Vgersh99,
My idea:
while read
comparison file, grep
each matching line >
create a new file
but then when I am gonna paste
two files, it will fail I think. I do not like awk
as I do not understand and unable to edit in my future needs. I suppose I need to learn how stuff works with awk
command.
Please do not reply promptly. I am trying to learn awk
command. Just let the baby crowl on the ground for 24h. Many thanks
Kind regards
Boris
Sure thing.
Here's my idea with awk:
read your ComparisonFile into an array indexed by $0 (hint: FNR==NR)
for each odd line in MainFile, substituting first ,
by itself appended with the last field on a line followed by :
. Set a flag
If flag is set and you're on an even line, output the line and reset the flag.
rdrtx1
October 1, 2018, 5:45pm
6
start with:
while read line
do
last_word=${line##* }
word=0
grep -q -i "^$last_word$" ComparisonFile && word=1
if [ $word = 1 ]
then
line=${line/Test,/Test,$last_word: }
echo "${line% *}"
else
echo "${line}"
fi
done < MainFile
I get this - not exactly what the OP was after....
Test,Moscow: AAEE9FED3, GGBBDD DD AA X2d Moscow
112233445566aaBBccPPdddEE1
Test,Leningrad: AAEE9FED3, GG33DD s00022 Leningrad
11298932566aaBBccPPdddEE2
Test,AAEE9FED3, 33VVDD sdsds333 Belgorod
11090aBBccPPdSDSDEw00
Test,Astrakhan: AAEE9FED3, QQTT11 00DD2 Astrakhan
112233445566aaBBccPPdddEE3
Test,SDFEE3D3, SDPL31 00DD2 Buryatiya
112233445566aaBBccPPdddEE
RudiC
October 1, 2018, 7:04pm
8
@rdrtx1 : still doesn't seem to suppress the records NOT in ComparisonFile, after the revision. And, it grep
s the ComparisonFIle 25k times ... might become lengthy.
Hello Vgersh99,
Regarding your remark 1, I did not understand what happens when we read the ComparisonFile into array but here you are:
awk 'NR==FNR{a[$0];}{if($0 in a)print $0}' ComparisonFile > ReadArray
Do not understand how to use ReadArray file in this case.
Now, read
the last column in Mainfile and if it's the same with $1
(line in ComparisonFile), replace ,
by ,$1:
:
While read line_ComparisonFile && read -r line_MainFile <&3; do
L=$(awk '{print $NF}' $line_MainFile)
awk '$line_ReadArray==$L '{gsub(/,/,"$line_ReadArray:")}' {f=1} f'
done < ComparisonFile 3<MainFile
This is what I learnt so far.
PS: As I could not append even/odd
function into it, I do not know how flag will understand when it will be turned on/off.
awk 'NR%2==0' MainFile > even
awk 'NR%2==1' MainFile > odd
Many thanks
Boris
1 Like
here's the code demonstrating the algorithm described in my previous comment - line by line:
awk '
FNR==NR{f2[$1];next}
FNR%2 { if ($NF in f2) {sub(",","&" $NF ": ",$1);$NF=""; f=1; print;next} }
f {print $0; f--}' ComparisonFile MainFile
this can be further "simplified":
awk '
FNR==NR{ f2[$1];next }
FNR%2 { if ($NF in f2) {sub(",","&" $NF ": ",$1);$NF=""; f=1; print;next} }
f&&f--' ComparisonFile MainFile
1 Like