Szaffy
November 1, 2011, 10:04am
1
Hi,
I have one input file with the following content:
MY_inpfile.txt
Aname1 Cname1 Cname2 1808 5
Aname2 Cname1 1802 47
Bname1 ? 1819 22
Bname2 Cname1 1784 11
Bname3 1817 9
Zname1 Cname1 1805 59
Zname2 Cname1 Cname2 Cname3 1797 27
Every line in my input file have a 4 digit column (1808, 1802, 1819,....,1797) but in different position!
I would like to generate the following output file (MY_outputfile.txt):
MY_outputfile.txt
Aname1 Cname1 Cname2 DUMMYSTR 1808 5
Aname2 Cname1 DUMMYSTR DUMMYSTR 1802 47
Bname1 ? DUMMYSTR DUMMYSTR 1819 22
Bname2 Cname1 DUMMYSTR DUMMYSTR 1784 11
Bname3 DUMMYSTR DUMMYSTR DUMMYSTR 1817 9
Cname1 Cname1 DUMMYSTR DUMMYSTR 1805 59
Cname2 Cname1 Cname2 Cname3 1797 27
In the output file the first 4 digit column (1808, 1802, 1819,....,1797) should be in the same position!
How do I find the first number column in each line and then insert a DUMMYSTR for each missing columns before the first number column and create output file?
I would like to use awk to achieve this.
Thanks
Szaffy
CarloM
November 1, 2011, 10:41am
2
Is the numeric 4 column in the output always going to be column 5, or is it variable (whatever the maximum num4 column was in the input)?
Try this...
awk '
NR==FNR{a[$0]=NF;x<NF?x=NF:NULL;next}
END{
for(i in a){
if(a==x){ print i; continue }
split(i,arr," "); s=length(arr);
num2=arr;num1=arr[s-1]
for(j=s-1;j<=x;j++){ arr[j]="DUMMYSTR" }
arr[x-1]=num1; arr[x]=num2
for(j=1;j<=length(arr);j++){ printf arr[j]" " }
printf "\n"
}
}' input_file | sort
--ahamed
Szaffy
November 1, 2011, 12:02pm
4
Hi Carlo,
Numeric field is in variable position (column) in the input file!
First line 1808 is in column4
Second line 1802 is in column3
..
..
Last line 1797 is in column5
Regards
Szaffy
---------- Post updated at 04:02 PM ---------- Previous update was at 03:59 PM ----------
ahamed101:
Try this...
awk '
NR==FNR{a[$0]=NF;x<NF?x=NF:NULL;next}
END{
for(i in a){
if(a==x){ print i; continue }
split(i,arr," "); s=length(arr);
num2=arr;num1=arr[s-1]
for(j=s-1;j<=x;j++){ arr[j]="DUMMYSTR" }
arr[x-1]=num1; arr[x]=num2
for(j=1;j<=length(arr);j++){ printf arr[j]" " }
printf "\n"
}
}' input_file | sort
--ahamed
Hi Ahamed,
I getting the following error:
awk: Cannot read the value of arr. It is an array name.
Regards
Szaffy
Working for me though... If Solaris, use nawk.
[root@bt]cat input_file
Aname1 Cname1 Cname2 1808 5
Aname2 Cname1 1802 47
Bname1 ? 1819 22
Bname2 Cname1 1784 11
Bname3 1817 9
Zname1 Cname1 1805 59
Zname2 Cname1 Cname2 Cname3 1797 27
[root@bt]awk '
> NR==FNR{a[$0]=NF;x<NF?x=NF:NULL;next}
> END{
> for(i in a){
> if(a==x){ print i; continue }
> split(i,arr," "); s=length(arr);
> num2=arr;num1=arr[s-1]
> for(j=s-1;j<=x;j++){ arr[j]="DUMMYSTR" }
> arr[x-1]=num1; arr[x]=num2
> for(j=1;j<=length(arr);j++){ printf arr[j]" " }
> printf "\n"
> }
> }' input_file | sort
Aname1 Cname1 Cname2 DUMMYSTR 1808 5
Aname2 Cname1 DUMMYSTR DUMMYSTR 1802 47
Bname1 ? DUMMYSTR DUMMYSTR 1819 22
Bname2 Cname1 DUMMYSTR DUMMYSTR 1784 11
Bname3 DUMMYSTR DUMMYSTR DUMMYSTR 1817 9
Zname1 Cname1 DUMMYSTR DUMMYSTR 1805 59
Zname2 Cname1 Cname2 Cname3 1797 27
--ahamed