How to decode text files?

Akshay_Hegde · January 29, 2013, 2:42am

HI experts....I am trying to decode some text files... I need little help from you people...file is of mixed type...from generated from windows based system

my text files looks like this...

2.AUBZ               158   1 11 116204310 6 N 7542 E       18
    02846    52833   102821   152815   202824   252812   302804   352825   402809
   452774   502759   552727   602712   652699   702685   752655   802603   852539
  2.AUCE                50  39 12 12314 453 0 S 5730 E      130
    0 374    5 374   10 375   15 375   20 375   25 374   30 373   35 373   40 371
   45 376   50 376   55 366   60 370   65 366   70 362   75 359   80 358   85 358
   90 354   95 354  100 319  105 262  110 243  115 233  120 226  125 220  130 217
  135 209  140 200  145 198  150 199  155 196  160 197  165 198  170 201  175 205
  180 207  185 211  190 215  195 217  200 218  205 217  210 218  215 217  220 218
  225 230  230 225  235 232  240 227  245 224  250 226  255 229  260 228  265 230
  270 234  275 237  280 239  285 237  290 241  295 242  300 240  305 239  310 239
  315 239  320 238  325 239  330 239  335 241  340 241  345 241  350 239  355 239

I want to print like this

2    AUBZ    158    1    11    1162043        10 6N    7542E    18    0    28.46
2    AUBZ    158    1    11    1162043        10 6N    7542E    18    5    28.33    
2    AUCE    50    39    12    12314 4     53 0S    5730E    130    0      37.4
2    AUCE    50    39    12    12314 4        53 0S    5730E    130    5    37.4                                            
2    AUCE    50    39    12    12314 4        53 0S    5730E    130    355    23.9

itkamaraj · January 29, 2013, 2:46am

Don't let the people to assume how the output is formed.

explain how the output is derived from the given input. Also let us know, what you tried so far ?

Akshay_Hegde · January 29, 2013, 5:59am

silly but still posting...as itkamaraj asked....I am just trying to port to shell from old fortran...

character clsgn,shnm,lp,lgp,aqsy,comnm
    dimension dstand(1000),tstand(1000),sstand(1000)
    integer dstand,cno,pno,yy,mm,hh,mn,dd,ld,lm,lgd,lgm,nop,x
    real tstand,sstand,time,lat,long
    open(2,file='out.dat')
        open(1,file='decoding.txt')
15      read(1,10,end=100)clsgn,shnm,cno,pno,yy,mm,dd,hh,mn,ld,lm,lp,lgd
     1,lgm,lgp,nop,aqsy,comnm
        read(1,11,end=100)(dstand(k),tstand(k),k=1,nop)
10     format(2x,a9,1x,a9,1x,i5,1x,i2,1x,i2,i2,i2,i2,i2,i2,i2,1x,a1,1x
     1,i2,i2,1x,a1,1x,i9,1x,a5,1x,a4)
11      format(9(2x,i3,f4.2))
    time=hh*1.00+mn*0.01
     lat=(1.00*ld)+(0.1*lm/6)
    long=(1.00*lgd)+(0.1*lgm/6)
    do k=1,nop
    write(2,30)clsgn,shnm,cno,pno,lat,long,dd,mm,yy,time
     1,dstand(k),tstand(k)
30      format(a9,1x,a9,1x,i5,1x,i2,1x,f5.2,1x,f5.2,1x,i2,1x,i2,1x
     1,i2,1x,f5.2,1x,i4,1x,f5.2,1x,f6.3)
    end do
    goto 15
100     continue
    close(1)
    stop
    end

---------- Post updated at 05:59 AM ---------- Previous update was at 05:09 AM ----------

whether itkamaraj can we make simple with shell scripting....I am getting some run time error with fortran for huge files.............

RudiC · January 29, 2013, 3:35pm

The FORTRAN source does not really help (although I still can read it). I can't see how you mangle the input file to get what you posted as desired output. Esp.: Why do you print just two lines for AUBZ using just the first two data entries in the first following line, not one nor three? Why three for AUCE? Why do you interpret 02846 as 0 28.45 , but 0 374 as 0 37.4 (not 3.74) ? Why do you skip all data points from third to before last for AUCE?

Akshay_Hegde · January 29, 2013, 11:02pm

Actually this type of data file is created by some mad people of our organization, I also struggled like you while interpreting datafile, finally I got funny answer from one big person that, in first section of AUBZ is created by old machines, and

2nd AUCE

is done in new machines, and finally what you need to do is whenever

like data read from

LSB

4 digit and split as

0 28.46

and whenever there is data like

0 374 (4th one is space)

separate

1st LSB

using

this actually told by that person...I don't know what to do...I am searching for help in fortran forum now...some people really don't understand...if you explain about datafile format they just want result....here my case also same...they need result..

RudiC · January 30, 2013, 3:49am

OK, let's dust off oooold FORTRAN capabilities. What your code does is easy: read a header holding a loop count according to a given fixed format, read count variables in fixed format, write count lines acc. to fixed format. Continue with: read next header and so on. BUT formats of neither input nor output file you provided do fit the I/O formats in code:

first header read, format 10:
  2.AUBZ               158   1 11 116204310 6 N 7542 E       18
xxaaaaaaaaaxaaaaaaaaaxiiiiixiixiiIIiiIIiiIIiixaxiiIIxaxiiiiiiiiixaaaaaxaaaa                     
  clsgn     shnm      cno   pno  mm  hh  ld   lp  lgm  nop             comnm                          
                               yy  dd  mn  lm   lgd  lgp         aqsy                           

data read, nop times, format 11, dstand & tstand:
    02846    52833   102821   152815   202824   252812   302804   352825   402809
xxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffff
   452774   502759   552727   602712   652699   702685   752655   802603   852539

second header read, format 10:
  2.AUCE                50  39 12 12314 453 0 S 5730 E      130
xxaaaaaaaaaxaaaaaaaaaxiiiiixiixiiIIiiIIiiIIiixaxiiIIxaxiiiiiiiiixaaaaaxaaaa
  clsgn     shnm      cno   pno  mm  hh  ld   lp  lgm  nop             comnm
                               yy  dd  mn  lm   lgd  lgp         aqsy       

data read, format 11:
    0 374    5 374   10 375   15 375   20 375   25 374   30 373   35 373   40 371
xxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffffxxiiiffff
  315 239  320 238  325 239  330 239  335 241  340 241  345 241  350 239  355 239



output write, format 30, nop times:
2    AUBZ    158    1    11    1162043        10 6N    7542E    18    0    28.46
aaaaaaaaaxaaaaaaaaaxiiiiixiixff.ffxff.ffxiixiixiixff.ffxiiiixff.ffxff.fff
clsgn     shnm      cno   pno      long  dd mm yy time       dstand(k)   
                             lat                                   tstand(k)

What I can see now already, the output file you provided does not fit to either input file nor the output format expected from the FORTRAN program. We should see 18 lines for AUBZ, and 130 for AUCE.I'm afraid I can't help further unless you give more hints / details.
It will certainly be possible to create e.g. an awk script that would duplicate that old code's functionality, given we know in detail what the requirements are...

Akshay_Hegde · January 30, 2013, 7:44am

please look at following code...

AUCE                 50 39 53.00 57.50 23  1 12 14.04    0  3.74
AUCE                 50 39 53.00 57.50 23  1 12 14.04    5  3.74
AUCE                 50 39 53.00 57.50 23  1 12 14.04   10  3.75
AUCE                 50 39 53.00 57.50 23  1 12 14.04   15  3.75
AUCE                 50 39 53.00 57.50 23  1 12 14.04   20  3.75
AUCE                 50 39 53.00 57.50 23  1 12 14.04   25  3.74
AUCE                 50 39 53.00 57.50 23  1 12 14.04   30  3.73

I could not about to make it 37.X in my code if its 3.73 its wrong as location is 57.50 (long) and 14.04(lat)

please see below code and attachment its reading both type format but problem is in assigning decimal point

character clsgn*4,shnm*12,lp*1,lgp*1,aqsy*5,comnm*4
    dimension dstand(1000),tstand(1000),sstand(1000)
    integer dstand,cno,pno,yy,mm,hh,mn,dd,ld,lm,lgd,lgm,nop,x
    real tstand,sstand,time,lat,long
    open(2,file='out.dat')
        open(1,file='decoding.txt')
15      read(1,10,end=100)clsgn,shnm,cno,pno,yy,mm,dd,hh,mn,ld,lm,lp,lgd
     1,lgm,lgp,nop,aqsy,comnm
        read(1,11,end=100)(dstand(k),tstand(k),k=1,nop)
10     format(4x,a4,1x,a12,1x,i5,1x,i2,1x,i2,i2,i2,i2,i2,i2,i2,1x,a1,1x
     1,i2,i2,1x,a1,1x,i9,1x,a5,1x,a4)
11      format(9(2x,i3,f4.2))
    time=hh*1.00+mn*0.01
     lat=(1.00*ld)+(0.1*lm/6)
    long=(1.00*lgd)+(0.1*lgm/6)
!    if(nop.gt.140)then
    do k=1,nop
    write(2,30)clsgn,shnm,cno,pno,lat,long,dd,mm,yy,time,dstand(k)
     1,tstand(k)
30      format(a4,1x,a12,1x,i5,1x,i2,1x,f5.2,1x,f5.2,1x,i2,1x,i2,1x,i2
     1,1x,f5.2,1x,i4,1x,f5.2)
    end do
!    end if
    goto 15
100     continue
    close(1)
    stop
    end

If I can get help from awk I will be very thankful. GFORTRAN I used to compile...if you anyone can able to do at least modification It will be helpful..

Thank you RudiC for replying me.

RudiC · January 30, 2013, 9:14am

Your second code/sample set is far clearer and consistent:
Your code is reading

  2.AUCE                50  39 12 12314 453 0 S 5730 E      130

into variables (ln 2 & 3) using this format

xxxxaaaaxaaaaaaaaaaaaxiiiiixiixiiIIiiIIiiIIiixaxiiIIxaxiiiiiiiiixaaaaaxaaaa
    clsgn               cno pno  mm  hh  ld   lp  lgm        nop aqsy  comnm       
         shnm                  yy  dd  mn  lm   lgd  lgp

(watch out - there's one space too many in cno and nop!), calculates decimal lat & long from ld, lm, lgd, lgm, each in degrees/seconds, and writes out the variables

clsgn               cno pno  lat  long dd mm yy  time dstand(k)
     shnm                                                  tstand(k)

using this format

aaaaxaaaaaaaaaaaaxiiiiixiixff.ffxff.ffxiixiixiixff.ffxiiiixff.ff

producing this output line

AUCE                 50 39 53.00 57.50 23  1 12 14.04    0  3.74

which is identical to your sample output:

AUCE                 50 39 53.00 57.50 23  1 12 14.04    0  3.74

HOoooKay, here we go. There may be more elegant solutions possible, and the bells 'n whistles I leave up to you to implement, but at least you got sth to start with:

awk     'function makeheader()
                {clsgn = substr ($0,  5,  4)
                 shnm  = substr ($0, 12, 12)
                 cno   = substr ($0, 23,  5)
                 pno   = substr ($0, 29,  2)
                 yy    = substr ($0, 32,  2)
                 mm    = substr ($0, 34,  2)
                 dd    = substr ($0, 36,  2)
                 hh    = substr ($0, 38,  2)
                 mn    = substr ($0, 40,  2)
                 time  = hh + mn / 100
                 ld    = substr ($0, 42,  2)
                 lm    = substr ($0, 44,  2)
                 lat   = sprintf ("%4.2f", ld + lm / 60)
                 lp    = substr ($0, 47,  1)
                 lgd   = substr ($0, 49,  2)
                 lgm   = substr ($0, 51,  2)
                 long  = sprintf ("%4.2f", lgd + lgm / 60)
                 lgp   = substr ($0, 54,  1)
                 nop   = substr ($0, 56,  9)
#                 aqsy
#                 comnm
                 header = clsgn " " shnm " " cno " " pno " " lat " " long " " yy " " mm " " dd " " time
                }

         /^  2\./{makeheader(); next}

                 {for (i=0; i<9; i++) print header, substr ($0, i*9+3, 3), sprintf ("%5.2f", substr ($0, i*9+6, 4)/100)}
        ' decode.txt
AUBZ               158   1 10.10 75.70 11  1 16 20.43   0 28.46
AUBZ               158   1 10.10 75.70 11  1 16 20.43   5 28.33
AUBZ               158   1 10.10 75.70 11  1 16 20.43  10 28.21
AUBZ               158   1 10.10 75.70 11  1 16 20.43  15 28.15
.
.
.
AUCE                50  39 53.00 57.50 12  1 23 14.04   0  3.74
AUCE                50  39 53.00 57.50 12  1 23 14.04   5  3.74
AUCE                50  39 53.00 57.50 12  1 23 14.04  10  3.75
AUCE                50  39 53.00 57.50 12  1 23 14.04  15  3.75
.
.
.

Akshay_Hegde · January 30, 2013, 10:52pm

Thank you so much...will you please explain briefly.....

RudiC · January 31, 2013, 3:07am

More tedious than genius: If the input line starts with " 2.": count characters (using the FORTRAN format as a template) and assign to the header's partial variables (which you don't mandatorily need, you can use the substr()s immediately, except for time, lat, long) to compose the header. Then, for the next lines read, split them repeatedly into dstand/tstand pairs, outputting each pair together with the header variable.
Exercise for you: drop the function and compose header in the action part of /^ 2./, without using all the little variables.