awk with sed to combine lines and remove specific odd # pattern from line

cmccabe · April 11, 2019, 12:30pm

In the awk piped to sed below I am trying to format file by removing the odd xxxx_digits and whitespace after, then move the even xxxx_digit to the line above it and add a space between them. There may be multiple lines in file but they are in the same format. The Filename_ID line is the last line in the block and is unique to each. There will always be a newline seperating the blocks and that line FileName_ID is not processed only printed. It is possible that there is nothing above the FileName_ID and if this happens then it is also printed as is. The code executes but the output is unchanged and is probably not the best way. Thank you :).

file

00-0000-Lname-Fname-REPEAT
xxxx_0001 xxxx_0002
111111-yyyy
xxxx_0003 xxxx_0008
111111-yyyy-0
xxxx_0009 xxxx_0006
FileName_ID

FileName_ID


desired

xxxx_0002 00-0000-Lname-Fname-REPEAT
xxxx_0008 111111-yyyy
xxxx_0006 111111-yyyy-0
FileName_ID

FileName_ID

awk

awk 'NR%2{printf "%s ",$0;next;}1' file | sed 's/xxxx_[0-9][0-9][0-9][13579]//g'

RavinderSingh13 · April 11, 2019, 1:35pm

Hello cmccabe,

Could you please try following.

awk '
FNR==NR{
  if($0 ~ /^xxxx_[0-9]+/){
      for(i=1;i<=NF;i++){
         val=$i
         sub(/.*_/,"",val)
         if(val%2==0){
             array[++count]=$i
         }
      }
  }
  next
}
!NF || (!/^xxxx_[0-9]+/ && !/[0-9]+/){
  print
  next
}
!/^xxxx_[0-9]+/{
  print array[++count2],$0
}'   Input_file  Input_file

Output will be as follows.

xxxx_0002 00-0000-Lname-Fname-REPEAT
xxxx_0008 111111-yyyy
xxxx_0006 111111-yyyy-0
FileName_ID

FileName_ID

EDIT: Above solution will take care of only 1 EVEN id adding per line, lets say you may have multiple even ids which need to be added and printed in that case try following.

awk '
FNR==NR{
  if($0 ~ /^xxxx_[0-9]+/){
      for(i=1;i<=NF;i++){
         val=$i
         sub(/.*_/,"",val)
         if(val%2==0){
             if(FNR!=prev){
                 count++
                 prev=FNR
             }
             array[count]=(array[count]?array[count] OFS:"")$i
         }
      }
  }
  prev=FNR
  next
}
!NF || (!/^xxxx_[0-9]+/ && !/[0-9]+/){
  print
  next
}
!/^xxxx_[0-9]+/{
  print array[++count2],$0
}'   Input_file  Input_file

Thanks,
R. Singh

RavinderSingh13 · April 11, 2019, 1:53pm

Hello cmccabe,

My previous solution works with reading Input_file 2 times, try following with reading Input_file single time only.

awk '!NF || (!/^xxxx_[0-9]+/ && !/^[0-9]+/){
  print
  next
}
!/^xxxx_[0-9]+/{
  line=$0
  next
}
{
  for(i=1;i<=NF;i++){
     val=$i
     sub(/.*_/,"",val)
     if(val%2==0){
         value=(value?value OFS:"")$i
     }
  }
  print value,line
  line=val=value=""
}
END{
  if(line){
     print line
  }
}'   Input_file

Output will be as follows.

xxxx_0002 00-0000-Lname-Fname-REPEAT
xxxx_0008 111111-yyyy
xxxx_0006 111111-yyyy-0
FileName_ID

FileName_ID

Thanks,
R. Singh

Scrutinizer · April 11, 2019, 6:56pm

Another option to try:

awk '/xxxx/{print $2, p} {p=A[NR]=$0} END{for(i=2; i>=0; i--) print A[NR-i]}' file

anbu23 · April 12, 2019, 5:41am

$ awk ' /FileName_ID/ || /^$/ { print; next} { a=$0; getline; print $NF, a } ' file
xxxx_0002 00-0000-Lname-Fname-REPEAT
xxxx_0008 111111-yyyy
xxxx_0006 111111-yyyy-0
FileName_ID

FileName_ID