I tried using below command
tr -cd "[:print:]" < InputFile.xml > output.txt
============= This removes all the tabs/newline/extra spaces from a file
it successfully removed all the extra spaces,tabs and new line characters but then the complete file become one record. I want to retain one new line character which is actually the record delimiter.
Which is consistent. Every record always finishes with </Success></Employee>
Input
26,<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE EmployeeDetails SYSTEM "EmpDetails.dtd">
<Employee>
<Address>
<Street>1234 64th Ave</Street>
</Address>
</Employee>,<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Success SYSTEM "Succ.dtd"><Employee><Success>inserted</Success></Employee>
47,<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE EmployeeDetails SYSTEM "EmpDetails.dtd">
<Employee>
<Address>
<Street>PO BOX 56</Street>
</Address>
</Employee>,<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Success SYSTEM "Succ.dtd"><Employee><Success>updated</Success></Employee>
output
26|<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE EmployeeDetails SYSTEM "EmpDetails.dtd"><Employee><Address><Street>1234 64th Ave</Street></Address></Employee>|<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Success SYSTEM "Succ.dtd"><Employee><Success>inserted</Success></Employee>
47|<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE EmployeeDetails SYSTEM "EmpDetails.dtd"><Employee><Address><Street>PO BOX 56</Street></Address></Employee>|<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Success SYSTEM "Succ.dtd"><Employee><Success>updated</Success></Employee>