Generating an xml having information related to files in the directory

Hi all,

Have to generate an xml having information related to files in the directory

Suppose i have file

file1.xml (datafile)
file2.xml (datafile)
file3.xml (metafile)

Now i need to generate an xml in the format >>
<?xml version="1.0" encoding="UTF-8"?>
<AuditFile Version="2.0">
<UnitOfWork UnitSequenceNr="1" FileCount="3" ArchiveID="106B">
<DataFile>
<FileName>file1</FileName>
<FileSize>10357</FileSize>
</DataFile>
<DataFile>
<FileName>file2</FileName>
<FileSize>19850</FileSize>
</DataFile>
<MetaFile>
<FileName>file3</FileName>
<FileSize>3430</FileSize>
</MetaFile>
</UnitOfWork>
</AuditFile

You can do something like that :

ls -l *.xml 2>/dev/null | 
awk '
{
   sub(/\.[^.]*$/, "", $NF);  # Removes extension
   FileCount++;
   FileName[FileCount] = $NF;
   FileSize[FileCount] = $5;
}
END {
   print  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
   print  "<AuditFile Version=\"2.0\">";
   printf "<UnitOfWork UnitSequenceNr=\"1\" FileCount=\"%d\" ArchiveID=\"106B\">\n",FileCount

   for (f=1; f<=FileCount; f++) {
      print  "<DataFile>";
      printf "<FileName>%s</FileName>\n", FileName[f];
      printf "<FileSize>%d</FileSize>\n", FileSize[f];
      print  "</DataFile>";
   }

   print  "</UnitOfWork>";
   print  "</AuditFile>"
}
'

This solution display only DataFiles because I don't know how to differentiate DataFile and MetaFile.

Jean-Pierre.

what does /dev/null means in
"ls -l *.xml 2>/dev/null "

---------- Post updated at 11:35 AM ---------- Previous update was at 11:21 AM ----------

cant we concatenate two different files together

2>/dev/null : Redirect error messages to null device.

$ ls -l test.xml
test.xml not found
$ ls -l test.xml 2>/dev/null
$ 

To concatenate file1 and file2 into file3 :

$ ls file?
file1  file2
$ cat file1
Datas from file1
$ cat file2
Datas from file2
$ cat file1 file2 > file3
$ ls file?
file1  file2  file3
$ cat file3
Datas from file1
Datas from file2
$

Jean-Pierre.

---------- Post updated at 09:40 ---------- Previous update was at 09:31 ----------

2>/dev/null : Redirect error messages to null device.

$ ls -l test.xml
test.xml not found
$ ls -l test.xml 2>/dev/null
$ 

To concatenate file1 and file2 into file3 :

$ ls file?
file1  file2
$ cat file1
Datas from file1
$ cat file2
Datas from file2
$ cat file1 file2 > file3
$ ls file?
file1  file2  file3
$ cat file3
Datas from file1
Datas from file2
$

Jean-Pierre.

Hi all,

As i am having close to 300k files in my database, so ls -l *.xml is failing.

I have file names like file1 to file300000...

Can i do this job in batches of 1000s and using '?' instead of '*' in ls command

The following new version of script will work in your case :

ls -l  2>/dev/null | 
awk '
/\.xml$/ {
   sub(/\.[^.]*$/, "", $NF);  # Removes extension
   FileCount++;
   FileName[FileCount] = $NF;
   FileSize[FileCount] = $5;
}
END {
   print  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
   print  "<AuditFile Version=\"2.0\">";
   printf "<UnitOfWork UnitSequenceNr=\"1\" FileCount=\"%d\" ArchiveID=\"106B\">\n",FileCount

   for (f=1; f<=FileCount; f++) {
      print  "<DataFile>";
      printf "<FileName>%s</FileName>\n", FileName[f];
      printf "<FileSize>%d</FileSize>\n", FileSize[f];
      print  "</DataFile>";
   }

   print  "</UnitOfWork>";
   print  "</AuditFile>"
}
'

Jean-Pierre.

Hi Jean-Pierre,

The above code is mentioning all the exisiting xmls in the file.

I want only those xmls which have a specific pattern.

I have 300k file s of some fixed pattern like
(Name.Version.1.xml)
to
(Name.Version.300000).xml

and frew of other format

Modify the selection pattern to meet your requirement :

ls -l  2>/dev/null | 
awk '
/\.xml$/ {

For example if you want to procced all Name.Version.<number>.xml files :

ls -l  2>/dev/null | 
awk '
/Name\.Version\.[0-9]+\.xml$/ {

Jean-Pierre.

Great help...

But how did this code manage so many files..can you please explain a bit....

It will be really helpful..

---------- Post updated at 10:05 AM ---------- Previous update was at 09:50 AM ----------

In the line

/Name\.Version\.[0-9]+\.xml$/

can we include a variable defined in the script and passed in to awk

---------- Post updated at 10:31 AM ---------- Previous update was at 10:05 AM ----------

ls -l 2>/dev/null |
awk -v var=$mydate '
/Name\.Version\.[0-9]+\.xml$/ {

i want to ue "var" in the search pattern