Regex Expression Replace

dr46014 · July 24, 2017, 2:21pm

I have a XML file where there is a tag with like

<wd:address_line_1>1234 Street</wd:address_line_1>

I want to replace the values "1234 Street" with "Test Data". Different people have different address lines and i want to replace with a fixed value to mask the file. I was trying to use sed with regex but mostly the characters like < and </ . Need some help on this.

jim_mcnamara · July 24, 2017, 2:36pm

Is the address line always like you describe, so that the two "keys" are:
one at the start of the line
the next always at the end of the line follwed by a \n
-- and the address lines are always on a single line ??

If so,

awk ' /^wd:address_line_/  {sub("\>.*\<", ">Test data"<") }
        {print} '  somefile.xml

dr46014 · July 24, 2017, 3:59pm

awk: cmd. line:1: warning: escape sequence `\>' treated as plain `>'
awk: cmd. line:1: warning: escape sequence `\<' treated as plain `<'

Scrutinizer · July 24, 2017, 4:38pm

In this case it is just a warning, but they should not be there*, so remove the back slashes. The issue in this case is with

/^wd:address_line_/

which should be changed to

/^<wd:address_line_/

.

--
An alternative approach would be:

awk '/^wd:address_line_/{$2="Test data"}1' RS=\< ORS=\< FS=\> OFS=\> file.xml

--
*Note: In GNU awk \< and \> have a special meaning (left and right word boundary).

dr46014 · July 25, 2017, 10:31am

awk '/^wd:address_line_/{$2="Test data"}1' RS=\< ORS=\< FS=\> OFS=\> file.xml

samething i am doing for national_id but both national_id and national_id_type is getting replaced. I need only one to be replaced.

Corona688 · July 25, 2017, 11:09am

Use code tags for code please.

```text
stuff
```

MadeInGermany · July 25, 2017, 1:07pm

An attempt with sed

sed '
s|\(<wd:address_line_1\)>.*<|\1>Test data<|
s|\(<wd:national_id\)>.*<|\1>Test data2<|
'

Scrutinizer · July 26, 2017, 4:08am

Try:

awk '$1=="wd:address_line_1"{$2="Test data"}1' RS=\< ORS=\< FS=\> OFS=\> file.xml