Grep string causes extra spaces

Hello,
I have an xml file and my aim is to grab each line in keywords file and search the string in another file.
When keyword is found in xml file,I expect the script to go to previous line in the xml file and grab the string/value between two strings. It's almost working with an error.

tab separated keywords.txt

test1 qqq98
test35 sss32
test26 Rsiw

1.xml file

  <id="229954e70d6b702f8d570b4be11af181">
    <display-name>test44 lgi3d</display-name>
  <id="229954e70d6b702f8d51331cbe11af181">
    <display-name>test35 kkld</display-name>
  <id="2223230did3s2Qafevrgvve1cbe11af181">
    <display-name>test26 Rsiw</display-name>

expected output:

test1 qqq98 id=""
test35 sss32 id=""
test26 Rsiw id="2223230did3s2Qafevrgvve1cbe11af181"
while read COL1 COL2 && read -r line <&3; do
A=$(grep -B1 "$COL1.*$COL2" 1.xml | grep -v "display-name" | sed -e 's/<id=\"\(.*\)\">/\1/' )
#A=$(grep -B1 "$COL".*$COL2" 1.xml | grep -v "display-name" | grep -o -P '(?<=<id=\").*(?=\">)')
echo "$COL1 $COL2 id=\"$A\""
done < keywords.txt 3<1.xml

This gives:

test1 qqq98 id=""
test35 sss32 id=""
test26 Rsiw id="  2223230did3s2Qafevrgvve1cbe11af181"

I wondered why there are two spaces before $A variable at output console.

Thank you
Boris

It doesn't print "extra" spaces, but the two leading spaces in the "id" line, which you do not remove with your sed command. Try again piping through

sed -e 's/^ *<id=\"\(.*\)\">/\1/'

, i.e. include the spaces from line start...

1 Like

How about (be aware there's NO test1 in your data samples)

awk -F"[<>]" '
NR == FNR       {T[$0]
                 next
                }
/<id/           {TMP = $2
                 next
                }
                {print $3, ($3 in T)?TMP:"id=\"\""}
' keywords.txt 1.xml 
test44 lgi3d id=""
test35 kkld id=""
test26 Rsiw id="2223230did3s2Qafevrgvve1cbe11af181"

Aside: why do you read line <&3 and then don't use it?

1 Like

Thank You Rudic,
This one also works as expected.

Kind regards
Boris

while read key; do
        while read line; do
                if [[ $line =~ $key ]]; then
                        IFS=\" read a id z
                        break
                fi
        done < <(tac 1.xml)
        echo $key id=\"$id\"
        unset id
done < keywords.txt
1 Like
awk -F ">|<" '
NR == FNR       {tmp=$2; getline; T[$3] = tmp; next
                }
                {print  $0, ($0 in T)?T[$0]:"id=\"\""
                }
' 1.xml keywords.txt

--- Post updated at 20:34 ---

awk -F '[<>"]' '
NR == FNR       {tmp=$3; getline; T[$3] = tmp; next
                }
                {print  $0, "id=\"" T[$0] "\""
                }
' 1.xml keywords.txt
1 Like

Thank You All,
I will also test your codes and keep you posted.

Kind regards
Boris