sasi_u
1
To split the files
Hi,
I'm having a xml file with multiple xml header. so i want to split the file into multiple files.
Test.xml
---------
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
I want to split the test.xml into 3 files (each xml) like below
test1.xml
---------
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
test2.xml
---------
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
test3.xml
---------
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
I tried with the awk command but still didn't get thru.
Pls help on this.
Thanks,
what exactly have you tried?
sasi_u
3
awk '/<?xml version="UTF_8">/{n++}{print > f n}' f=test test.xml
Then even i modified the xml to include BEGINXML and ENDXML (for each xml wise) and tried with the below command
awk '/BEGINXML/{f="doc."++d} f{print > f} /ENDXML/{close f; f=""}' test.xml
its not working.
Pls suggest
Hi,
try:
awk '/xml/{c++}{print > "file" c ".xml"}' file
HTH
Chris
aigles
5
try and adapt the following awk script :
awk '
FNR==1 {
path = namex = FILENAME;
sub(/^.*\//, "", namex);
sub(namex "$", "", path );
name = ext = namex;
sub(/\.[^.]*$/, "", name);
sub("^" name, "", ext );
}
/<\?xml / {
if (out) close(out);
out = path name (++file) ext ;
print "Spliting to " out " ...";
}
/<\?xml /,/<\/emp>/ {
print $0 > out
}
' sasi.xml
Input file (sasi.xml)
$ cat sasi.xml
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
$ ./sasi.sh
Spliting to sasi1.xml ...
Spliting to sasi2.xml ...
Spliting to sasi3.xml ...
$ more -999 sasi[0-9].xml
::::::::::::::
sasi1.xml
::::::::::::::
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
::::::::::::::
sasi2.xml
::::::::::::::
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
::::::::::::::
sasi3.xml
::::::::::::::
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
$
Jean-Pierre.
By the way, your document declaration is invalid
<?xml version="UTF_8">
The version should be either 1.0 or 1.1. There is no valid XML version called "UTF_8". UTF-8 is a character encoding scheme.
The following is probably what you want:
<?xml version="1.0" encoding="UTF-8"?>
# cat Test.xml
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
# ./justdoit Test.xml
1. Splitted File Name -> "test1.xml"
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
2. Splitted File Name -> "test2.xml"
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
3. Splitted File Name -> "test3.xml"
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
# cat justdoit
#!/bin/bash
totalcnt=$(sed -n '/<?xml/,/emp>/p' $1 | sed -n '$=')
mycnt=$(sed -n '1,/emp>/p' $1 | sed -n '$=')
count=`expr $totalcnt / $mycnt `
first=1;endof=$mycnt;in=1
while [ $(( count -=1 )) -gt -1 ]
do
sed -n "${first},${endof}p" $1 > test"$in"
echo -e "\n$in. Splitted File Name -> \"test"$in".xml"\" ; cat test"$in"
first=`expr $first + $mycnt `
endof=`expr $endof + $mycnt `
in=`expr $in + 1 `
done
Regards
ygemici
kurumi
8
linux$ csplit file '/^<emp/4' "{*}"
1 Like
aigles
9
Thanks kurumi, i learn something today.
Another csplit approch :
$ cat sasi.xml
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
$ csplit -f sasi -b _%d.xml -z sasi.xml '/<\/emp>/1' '{*}'
73
73
73
$ ls -l sasi_*.xml
-rw-r--r-- 1 Jean-Pierre Aucun 73 2010-07-25 12:18 sasi_0.xml
-rw-r--r-- 1 Jean-Pierre Aucun 73 2010-07-25 12:18 sasi_1.xml
-rw-r--r-- 1 Jean-Pierre Aucun 73 2010-07-25 12:18 sasi_2.xml
$ head sasi*_*.xml
==> sasi_0.xml <==
<?xml version="UTF_8">
<emp: ....>
<name>a</name>
<age>10</age>
</emp>
==> sasi_1.xml <==
<?xml version="UTF_8">
<emp: ....>
<name>b</name>
<age>10</age>
</emp>
==> sasi_2.xml <==
<?xml version="UTF_8">
<emp: ....>
<name>c</name>
<age>10</age>
</emp>
$
awk 'BEGIN{a=0;i=0}
/<?xml/ {a=1}
/<\/emp>/ {print > "File" i ".xml";i++;a=0}
{if (a==1) print > "File" i ".xml"}' urfile
sasi_u
11
Hi All,
Thanks a lot for helping me in this.
aigles- Thanks vry much.
csplit command has many limits in this case. For example, if there are other lines between </emp> and <?xml version="UTF_8">, you will get wrong o/p.