Problems with delimiters

The_Observer · July 10, 2008, 11:53am

Hello,

I have data in a file something like this -

UNB+UNOA:1+006415160:1+AR0000012360:ZZ+080701:0552+2++DELFOR++++T'UNH+2+DELFOR:D:97A:UN

Here, the delimiters used are + , : and ' . I have a set of such files in which these delimiters vary from one file to another.

I am developing a shell script which needs to take certain values from fields in this file depending on their positions which I will be doing it with the help of cut command.

My problem is how to find these varying delimiters from file to file. Means, in every file , how can i find out what is the delimiter after UNB(here +) , after UNOA (here or before UNH (here '). I will store them in variables & then search for my required fields.

Please help me. It is very urgent.

joeyg · July 10, 2008, 12:02pm

Is there a reason for some many different delimiters? Why not change the + to | then the : to | then the ' to | ? Then, your file would be consistent.

> echo $inp
UNB+UNOA:1+006415160:1+AR0000012360:ZZ+080701:0552+2++DELFOR++++T'UNH+2+DELFOR:97A:UN
> echo $inp | tr "+" "|" | tr ":" "|" | tr "'" "|"
UNB|UNOA|1|006415160|1|AR0000012360|ZZ|080701|0552|2||DELFOR||||T|UNH|2|DELFOR|97A|UN

quine · July 10, 2008, 2:21pm

Exactly. Create a "filter". Make all delimiters consistent (1 delimiter) then use that output to do your cuts, etc.

InputWithManyDelimiters | filterScript | cut -fn -d "~" where "~" is what ever delimiter you choose as the cannoical one.... "~" (tilde) is a good one at least for English because the character is not commonly used in data...

vgersh99 · July 10, 2008, 2:46pm

.... or specify multiple delimeters/FieldSeparators with 'awk'

danmero · July 10, 2008, 4:19pm

Useless Use Of tr :rolleyes:

> echo $inp | tr ":|+|'" "|"