Reject the record if the record in the next line does not begin with 2.

supchand · October 16, 2011, 9:27am

Hi,
I have a input file with the following entries:

1one
2two
3three
1four
2five
3six
1seven
1eight
1nine
2ten
2eleven
2twelve
1thirteen
2fourteen

The output should be:

1one
2two
3three
1four
2five
3six 
1nine
2ten
1thirteen
2fourteen

The record that begins with 1 should have the next record to begin with 2 or 3 else reject that record.
The record that begins with 1 should have the next record to begin with 2 and if there are many records starting with 2, consider the first record only.
The rejected records should be captured in a file and the valid records in another file.
The requirement is this needs to be done in unix shell scripting. Please help me out in this.

radoulov · October 16, 2011, 9:35am

And what did you try so far?

supchand · October 16, 2011, 9:40am

I have this requirement in ETL DatStage, but I was not able to derive this requirement so trying to split the records and save in a file through unix scripting and then this file will be utlized in datastage. Please help me in unix scripting as I dont have much knowledge on scripting

ctsgnb · October 16, 2011, 10:25am

$ cat f1
1one
2two
3three
1four
2five
3six
1seven
1eight
1nine
2ten
2eleven
2twelve
1thirteen
2fourteen

$ nawk '{x=y;y=$0}y~/^2/&&x~/^1/{print x RS y;x=y;getline y}y~/^3/{print y;x=y;getline y;}' f1
1one
2two
3three
1four
2five
3six
1nine
2ten
1thirteen
2fourteen
$

supchand · October 16, 2011, 10:53am

Thanks a million. this code is working. Is it possible to capture the rejected records in another file.

radoulov · October 16, 2011, 4:52pm

This code will generate two files: ok and rejected.

awk > ok 'END {
  for (i = 0; ++i <= NR;) {
    if (i in r && r ~ /^1/ && i + 1 in r && r[i + 1] ~ /^2/) {
      print r RS r[i + 1]
      delete r; delete r[i + 1]
      if (i + 2 in r && r[i + 2] ~ /^3/) {
        print r[i + 2]; delete r[i + 2]
        }
      }
    if (i in r ) print r > "rejected"        
    }
  }
{ r[NR] = $0 }' infile