Unix Linux Community

Read file excluding XML in it

Shell Programming and Scripting

chetan.c April 9, 2012, 10:42am 1

Hi ,

I have a file like below.I want all the content in a single line excluding the XML.How can i proceed?

 
[xxx]
t=21
y=23
 
[jjj]
rg=xyz
.....
<xmlstarts>
.
.
<xmlends>
 
[ppp]
lk=99
lo=09

Thanks,
Chetan.C

---------- Post updated at 09:42 AM ---------- Previous update was at 09:39 AM ----------

The output im expecting is like this [xxx]t=21y=23[jjj]rg=xyz[ppp]lk=99lo=09

balajesuri April 9, 2012, 11:05am 2

sed '/<xmlstarts>/,/<xmlends>/d' inputfile | tr -d '\n'

chetan.c April 10, 2012, 4:02am 3

Hi m
Thanks for the response.It is working fine as expected.The script is now like below

#!/bin/bash
cd /Extracted
for i in *.txt
do
echo `sed '/</,/>/d' "$i" | tr -d '\n'`"^"$i
echo -e '\n'
done

Since i need filename in front of each line i have used the for loop.
But the script runs very slow.total rows is round 900000 for 9000 files.Is there any better way than the script above to improve the performance?
Thanks,
Chetan.C

---------- Post updated 04-10-12 at 01:50 AM ---------- Previous update was 04-09-12 at 10:38 AM ----------

Any sugestions?

---------- Post updated at 03:02 AM ---------- Previous update was at 01:50 AM ----------

Is there a perl option where i can be doing it?

chetan.c May 1, 2012, 7:44am 4

Hi,

Wanted to know if i can use awk for this as i'm using echo to print filename for each file ans this is making the script very slow.

Thanks,
Chetan.c

---------- Post updated at 06:44 AM ---------- Previous update was at 02:31 AM ----------

Hi,

I have the script like below after some modifications.

awk -v ORS="" 'FNR==1 {printf("\n"FILENAME"\n")} ;{ sub(/<.*/,"");print }' *.txt

But this code prints a lot of Blank lines and i do not know why.
can somebody please let me know if this code is right.?

Thanks,
Chetan.C