Cutting a file with multiple delimiters into columns

luckycharm · March 31, 2010, 6:26pm

Hi All

I have recently had to start using Unix for work and I have hit brick wall with this prob....

I have a file that goes a little something like this....

EUR;EUR;EUR:USD:USD;USD;;;EUR/USD;XAU/AUD;XAU/EUR;XAU/AUD,GBP/BOB,UAD/XAU;;;1.11;2.22;3.33;4.44;5.55;6.66;;;

is it possible to create a script to manipulate this so that it reads each line the once it hits the ;;; begin a new line?

desired output being...

EUR          EUR          EUR           USD           USD          USD
EUR/USD      XAU/AUD      XAU/EUR       XAU/EUR       GBP/BOB      UAD/XAU
1.11         2.22         3.33          4.44          5.55         6.66

and furthermore, use that script to accept a parameter eg "EUR"
and have it output the columns that the world matches....like a grep by column if you will

Thanks a mil in advance...

LC

cfajohnson · March 31, 2010, 11:09pm

awk -F ';;;' '
{
  gsub(";","\t",$1)
  gsub("[;,]","\t",$2)
  gsub(";","\t",$3)
  print $1
  print $2
  print $3
}
' "$file"

steadyonabix · April 1, 2010, 4:44pm

awk -F";;;" -v w=15 ' {
        for ( i = 1; i <=NF; i++ ){
                split ( $i, s, ";" )
                for ( si in s ){
                        if ( s[si] ~ ":|," ){
                                char = (s[si] ~ ",") ?  "," : ":"
                                split( s[si], c, char )
                                for ( ci in c )
                                        printf("%-"w"s", c[ci])
                        }
                        else
                                printf("%-"w"s", s[si])
                }
                printf("\n")
        }
}' $file

Output: -

EUR            EUR            EUR            USD            USD            USD            
EUR/USD        XAU/AUD        XAU/EUR        XAU/AUD        GBP/BOB        UAD/XAU        
1.11           2.22           3.33           4.44           5.55           6.66

Franklin52 · April 1, 2010, 5:29pm

Or maybe with:

awk -F"[:;,]" '$1=$1' RS=";;;" OFS="\t"

alister · April 1, 2010, 5:31pm

$ sed 's/;;;/\
/g; s/[,;:]/____/g; s/\n$//' data
EUR     EUR     EUR     USD     USD     USD
EUR/USD XAU/AUD XAU/EUR XAU/AUD GBP/BOB UAD/XAU
1.11    2.22    3.33    4.44    5.55    6.66

*The red underscore should be a literal tab

steadyonabix · April 2, 2010, 4:08am

To get the columns: -

awk -F";;;" -v w=15 -v m="EUR" ' {
        out = ""
        for ( i = 1; i <=NF; i++ ){
                col = 0
                split ( $i, s, ";" )
                for ( si in s ){
                        if ( s[si] ~ ":|," ){
                                char = (s[si] ~ ",") ?  "," : ":"
                                split( s[si], c, char )
                                for ( ci in c ) {
                                        out = out sprintf("%-"w"s", c[ci])
                                        col++
                                        if ( c[ci] ~ m )
                                                _[col]++
                                }       
                        }
                        else {
                                out = out sprintf("%-"w"s", s[si])
                                col++
                                if ( s[si] ~ m )
                                        _[col]++
                        }       
                }
                printf("%s\n", out)
                out = ""
        }
        }END{
                out = m" is found in columns "
                for ( i in _ )
                        out = out sprintf("%d%c", i, ",")
                printf("%s\n", substr(out, 1, length(out) -1))
} ' y

Output: -

EUR            EUR            EUR            USD            USD            USD            
EUR/USD        XAU/AUD        XAU/EUR        XAU/AUD        GBP/BOB        UAD/XAU        
1.11           2.22           3.33           4.44           5.55           6.66           

EUR is found in columns 1,2,3

---------- Post updated 02-04-10 at 09:08 AM ---------- Previous update was 01-04-10 at 11:07 PM ----------

That's nice Franklin. What was your thinking with $1=$1? I don't understand it.

Did you notice $2=$2 removes the extra blank lines from the output?

ygemici · April 2, 2010, 4:55am

additional alister solution's as alternative

sed -e 's/;;;/\n/g' -e 's/[;,:]/\t/g' file

Regards
Yucel

Franklin52 · April 2, 2010, 6:29am

steadyonabix,

I've used $1=$1 to rearrange the line but didn't realize that it gives 2 extra blanc lines caused by the newline character.

Thanks for noticing that!

Since the line ends up with a newline character, a better way is to add an newline as extra fieldseparator:

awk -F"[:;,\n]" '$1=$1' RS=";;;" OFS="\t"

Regards