Splitting file based on pattern and first character

pema.yozer · May 25, 2012, 8:47am

I have a file as below
pema.txt

s2dhshfu dshfkdjh dshfd 
rjhfjhflhflhvflxhvlxhvx vlvhx
sfjhldhfdjhldjhjhjdhjhjxhjhxjxh
sjfdhdhfldhlghldhflhflhfhldfhlsh
rjsdjh#error occured#
skjfhhfdkhfkdhbvfkdhvkjhfvkhf
sjkdfhdjfh#error occured#

my requirement is to create 3 files frm the above file

1) pema.junk
will contain all the records where the txt #error occured# is present
2) pema.s
should contain all record starting with s except one's in pema.junk
3)pema.r
should contain all records starting with r except one's in pema.junk

The first line of the file doesnt contain any dta so should be ignored

and if possible all data after 135 position should be truncatd when copyin
into s and r file

clx · May 25, 2012, 9:07am

Like this..?

awk '/#error occured#/ {printf ("%.135s\n",$0) > "pema.junk"} /^r/ && ! /#error occured#/ { printf ("%.135s\n",$0) > "pema.r"} /^s/ && ! /#error occured#/ {printf ("%.135s\n",$0) > "pema.s"}' pema.txt

I know it looks dirty

Scott · May 25, 2012, 9:12am

It doesn't have to. Here's the same code formatted:

awk '
  /#error occured#/ {
    printf ("%.135s\n",$0) > "pema.junk"
  }

  /^r/ && ! /#error occured#/ { 
    printf ("%.135s\n",$0) > "pema.r"
  } 

  /^s/ && ! /#error occured#/ {
    printf ("%.135s\n",$0) > "pema.s"
  }
' pema.txt

Scrutinizer · May 25, 2012, 9:26am

Or:

awk '{f="pema." (/#error occured#/?"junk":substr($0,1,1)); print substr($0,1,135)>f}' infile

pema.yozer · May 28, 2012, 2:24am

thank you guys.. will try and let you know

---------- Post updated at 12:52 AM ---------- Previous update was at 12:42 AM ----------

i tried it but i'm getting an error
syntax error The source line is 13.
The error context is
>>> pema. <<< txt

---------- Post updated at 01:14 AM ---------- Previous update was at 12:52 AM ----------

apologies, the code works perfectly... the error was due to me..thank you very much

---------- Post updated at 01:24 AM ---------- Previous update was at 01:14 AM ----------

a slight modification on the below code

awk '
  /#error occured#/ {
    printf ("%.135s\n",$0) > "pema.junk"
  }

  /^r/ && ! /#error occured#/ { 
    printf ("%.135s\n",$0) > "pema.r"
  } 

  /^s/ && ! /#error occured#/ {
    printf ("%.135s\n",$0) > "pema.s"
  }
' pema.txt

can we make it so that the S and R is ignored and also if the record is shorted than 135 can we pad it with spaces so that its 135 in length?

Scrutinizer · May 28, 2012, 2:55am

Do you mean ignore the case of r or s? So that line that start with S and R are directed to the same file? Something like this?

awk '
  function pr(f){
    printf ("%-135s\n",substr($0,1,135))>f
  }     
                  
  /#error occured#/ {
    pr("pema.junk")
    next
  }

  /^(r|R)/ { 
    pr("pema.r")
  } 

  /^(s|S)/ {
    pr("pema.s")
  }
' infile

--
Some awks can do this:

awk '{NF=135; f="pema." (/#error occured#/?"junk":tolower($1)); print>f}' FS= infile

pema.yozer · May 28, 2012, 3:06am

sorry for the confusion, i mean to to say , can i ignore 'S' and 'R'
fo that the record looks like

2dhshfu dshfkdjh dshfd

instead of

s2dhshfu dshfkdjh dshfd

Scrutinizer · May 28, 2012, 3:32am

Like so?

awk '
  /#error occured#/ {
    print > "pema.junk"
    next
  }

  /^r/{ 
    printf "%-135s\n",substr($0,2,135) > "pema.r"
  } 

  /^s/{
    printf "%-135s\n",substr($0,2,135) > "pema.s"
  }
' pema.txt

pema.yozer · May 29, 2012, 2:16am

thanks works perfect