I need to realize this task.
In folder i have such files:
name1.txt
name1.pdf
name2.txt
name2.pdf
etc...
I want to scan this folder, match files with same name (name1.txt with name1.pdf, name2.txt with name2.pdf) and create files name1.xml and name2.xml, based on it. i.e:
i want to create .xml file with such structure:
<file>
<text>...</text>
<pdfcontent>...</pdfcontent>
</file>
,where between tags <text> I must put content of nameX.txt files;
and between tags <pdfcontent> I must put binary code of name nameX.pdf (base64 type or smth like it).
Okay, find finds all pdf or txt files in the current directory, awk strips off the leading and trailing dots turning ./filename.txt into /filename , then sort -u gets rid of any duplicates in that list. From there you can feed it into the shell one by one then match "./${BASE}".* which will turn into ./filename.* and match only that group of files, which you loop through in turn and do what you want with. Each time you do that, redirect the output into the new xml file.
find . -maxdepth 1 -type f -iname '*.txt' -o -iname '*.pdf' |
awk -v FS="." '{ print $2 }' | sort -u |
while read BASE
do
( echo "<file>"
for FILE in "./${BASE}".*
do
case "$FILE" in
*.pdf)
printf "<pdfcontent>"
openssl base64 < "$FILE"
printf "</pdfcontent>\n"
;;
*.txt)
printf "<text>"
cat "$FILE"
printf "</text>\n"
;;
*) # Do nothing for files of the wrong type, i.e. .xml
;;
esac
echo "</file>" ) > "./${BASE}".xml
done
#!/bin/bash
dir=<some directory>
cd $dir
for ea_file in `ls *.txt`
do
fname=`echo ${ea_name} | awk 'BEGIN{FS="."}{print $1}'`
if [ -f ${ea_file}.pdf ]; then
echo "<file>" > ${ea_file}.xml
echo "<text>${ea_file}.txt</text>" >> ${ea_file}.xml
echo "<pdf>${ea_file}.pdf</pdf>" >> ${ea_file}.xml
else
echo "WARNING: The file ${ea_file}.txt did not find cooresponding ${ea_file}.pdf"
fi
done
ls -l |awk -F'[\. ]' '/\.txt/||/\.pdf/{++a[$(NF-1)]}END{for(i in a) if(a==2) print "<file>\n<text>"i".txt<\/text>\n<pdfcontent>"i".pdf<\/pdfcontent>\n<\/file>" >i".xml"}'
Good Points, Thanks Chubler_XL.
I have no any idea about how XML parsing, so I followed you code,
how about this? please let me know as usual if any problem:)
for filename in `ls -l |awk -F'[. ]' '/\.txt/||/\.pdf/{++a[$(NF-1)]}END{for(i in a) if(a==2) print i}'`
do
echo -e "<file>\n<text><![CDATA[" `sed 's/]]>/] ]>/g' $filename.txt ` "]]></text>\n<pdfcontent>" `openssl base64 -in $filename.pdf` "</pdfcontent>\n</file>\n" >$filename.xml
done
#!/usr/bin/env ruby
# xml template
xml=<<EOF
<file>
<text>%s</text>
<pdfcontent>%s</pdfcontent>
</file>
EOF
Dir["*.txt"].each do |file|
filename=file.sub(/\.txt$/,"")
pdf = filename+".pdf"
xmlfile = filename+".xml"
if File.exists?( pdf )
w = sprintf( xml , file, pdf )
File.open(xmlfile,"w").write(w)
end
end
require 'base64'
# xml template
xml=<<EOF
<file>
<text>%s</text>
<pdfcontent>%s</pdfcontent>
</file>
EOF
Dir["*.txt"].each do |file|
filename=file.sub(/\.txt$/,"")
pdf = filename+".pdf"
xmlfile = filename+".xml"
if File.exists?( pdf )
b4=Base64.encode64( File.open(pdf).read )
w = sprintf( xml , file, b4 )
File.open(xmlfile,"w").write(w)
end
end
---------- Post updated at 10:21 PM ---------- Previous update was at 10:18 PM ----------
one problem i see is the listing of files using ls -l. A simple shell expansion will do. No need to use ls -l