How do i find the first number in each line and insert dummy string into the missing columns?

Szaffy · November 1, 2011, 10:04am

Hi,

I have one input file with the following content:

MY_inpfile.txt

Aname1 Cname1 Cname2 1808 5 
Aname2 Cname1 1802 47 
Bname1 ? 1819 22 
Bname2 Cname1 1784 11 
Bname3 1817 9 
Zname1 Cname1 1805 59 
Zname2 Cname1 Cname2 Cname3 1797 27

Every line in my input file have a 4 digit column (1808, 1802, 1819,....,1797) but in different position!

I would like to generate the following output file (MY_outputfile.txt):

MY_outputfile.txt

Aname1 Cname1  Cname2   DUMMYSTR  1808 5 
Aname2 Cname1  DUMMYSTR DUMMYSTR 1802 47 
Bname1 ?  DUMMYSTR DUMMYSTR   1819 22 
Bname2 Cname1  DUMMYSTR DUMMYSTR  1784 11 
Bname3 DUMMYSTR DUMMYSTR DUMMYSTR 1817 9 
Cname1 Cname1   DUMMYSTR DUMMYSTR 1805 59 
Cname2 Cname1  Cname2   Cname3  1797 27

In the output file the first 4 digit column (1808, 1802, 1819,....,1797) should be in the same position!

How do I find the first number column in each line and then insert a DUMMYSTR for each missing columns before the first number column and create output file?

I would like to use awk to achieve this.

Thanks
Szaffy

CarloM · November 1, 2011, 10:41am

Is the numeric 4 column in the output always going to be column 5, or is it variable (whatever the maximum num4 column was in the input)?

ahamed101 · November 1, 2011, 10:58am

Try this...

awk '
  NR==FNR{a[$0]=NF;x<NF?x=NF:NULL;next}
  END{
    for(i in a){
      if(a==x){ print i; continue }
      split(i,arr," "); s=length(arr);
      num2=arr;num1=arr[s-1]
      for(j=s-1;j<=x;j++){ arr[j]="DUMMYSTR" }
      arr[x-1]=num1; arr[x]=num2
      for(j=1;j<=length(arr);j++){ printf arr[j]" " }
      printf "\n"
    }
  }' input_file | sort

--ahamed

Szaffy · November 1, 2011, 12:02pm

Hi Carlo,

Numeric field is in variable position (column) in the input file!

First line 1808 is in column4
Second line 1802 is in column3
..
..
Last line 1797 is in column5

Regards
Szaffy

---------- Post updated at 04:02 PM ---------- Previous update was at 03:59 PM ----------

ahamed101:

Try this...

awk '
  NR==FNR{a[$0]=NF;x<NF?x=NF:NULL;next}
  END{
   for(i in a){
   if(a==x){ print i; continue }
   split(i,arr," "); s=length(arr);
   num2=arr;num1=arr[s-1]
   for(j=s-1;j<=x;j++){ arr[j]="DUMMYSTR" }
   arr[x-1]=num1; arr[x]=num2
   for(j=1;j<=length(arr);j++){ printf arr[j]" " }
   printf "\n"
   }
  }' input_file | sort

--ahamed

Hi Ahamed,

I getting the following error:

awk: Cannot read the value of arr. It is an array name.

Regards
Szaffy

ahamed101 · November 1, 2011, 12:06pm

Working for me though... If Solaris, use nawk.

[root@bt]cat input_file
Aname1 Cname1 Cname2 1808 5
Aname2 Cname1 1802 47
Bname1 ? 1819 22
Bname2 Cname1 1784 11
Bname3 1817 9
Zname1 Cname1 1805 59
Zname2 Cname1 Cname2 Cname3 1797 27
[root@bt]awk '
>   NR==FNR{a[$0]=NF;x<NF?x=NF:NULL;next}
>   END{
>     for(i in a){
>       if(a==x){ print i; continue }
>       split(i,arr," "); s=length(arr);
>       num2=arr;num1=arr[s-1]
>       for(j=s-1;j<=x;j++){ arr[j]="DUMMYSTR" }
>       arr[x-1]=num1; arr[x]=num2
>       for(j=1;j<=length(arr);j++){ printf arr[j]" " }
>       printf "\n"
>     }
>   }'  input_file | sort
 
Aname1 Cname1 Cname2 DUMMYSTR 1808 5
Aname2 Cname1 DUMMYSTR DUMMYSTR 1802 47
Bname1 ? DUMMYSTR DUMMYSTR 1819 22
Bname2 Cname1 DUMMYSTR DUMMYSTR 1784 11
Bname3 DUMMYSTR DUMMYSTR DUMMYSTR 1817 9
Zname1 Cname1 DUMMYSTR DUMMYSTR 1805 59
Zname2 Cname1 Cname2 Cname3 1797 27

--ahamed

CarloM · November 1, 2011, 12:27pm

nm...