Hi,
I am very new to Unix script. Suppose i have a file with column header:
NAME1 NAME2 Address Tel
And I always need to make a file with column header:
ID NAME1 NAME2 EMail Address Tel
For the columns that do not exist in the file, I would still like to make a column with blank.
How do I do this in UNix script?
Assuming you are using TAB as field separator:
awk -F"\t" -vOFS="\t" '{$6=$4;$5=$3;$4=(NR==1)?"EMail":"";$3=$2;$2=$1;$1=(NR==1)?"ID":""}1' file
Since you haven't mentioned, I will assume that it's a fixed-format file -
$
$
$ cat f0
NAME1 NAME2 ADDRESS TEL
name001 name002 addr001 tel001
name011 name022 addr011 tel011
name111 name222 addr111 tel111
$
$
$ awk 'NR==1{printf("%-10s%-10s%-10s%-10s%-10s%-s\n","ID",$1,$2,"EMAIL",$3,$4)}
NR>1 {printf("%-10s%-10s%-10s%-10s%-10s%-s\n"," ",$1,$2," ",$3,$4)}
' f0
ID NAME1 NAME2 EMAIL ADDRESS TEL
name001 name002 addr001 tel001
name011 name022 addr011 tel011
name111 name222 addr111 tel111
$
$
$
tyler_durden
If the fields are separated by spaces:
awk 'NR==1{$1="ID" FS $1; $3=$3 FS "EMail"}{$1=FS $1; $3=$3 FS}1' file
This is what I have got so far.
abc.txt:
NAME1 NAME2 ADDRESS TEL
A B ADD 1234
awk -F"\t" '
NR==1{
for(i=1;i<=NF;i++)
if($i~/ID/)f[n]=i
n++
for(i=1;i<=NF;i++)
if($i~/NAME1/)f[n]=i
n++
for(i=1;i<=NF;i++)
if($i~/NAME2/)f[n]=i
n++
for(i=1;i<=NF;i++)
if($i~/EMAIL/)f[n]=i
n++
for(i=1;i<=NF;i++)
if($i~/ADDRESS/)f[n]=i
n++
for(i=1;i<=NF;i++)
if($i~/TEL/)f[n]=i
n++
}
{
if (NR==1)
print "ID NAME1 NAME2 EMAIL ADDRESS TEL";
}
{
if (NR > 1) {
for(i=0;i<n;i++)
printf"%s%s",i?" ":"",$f;
print""
}
}
' abc.txt
But I keep getting the error "Field is not correct"
I think the error is due to the ID AND EMAIL column not found (by searching for the column name) in abc.txt, hence f[n] evaluates to blank. When the code tries to do printf with $<blank>, it gives this error.
I am almost there, just need a tip how i can handle printf $<blank> so that it prints a blank column rather than errored out.
As i said, i am very new to unix scripting
---------- Post updated 10-05-10 at 08:03 AM ---------- Previous update was 10-04-10 at 06:24 PM ----------
Hi experts,
Anyone can help me with this please?
Potentially this implementation can help save any manual work that i am doing. That means less Overtime for me and more time with my family.
Thanks
Your script doesn't have to be so convoluted.
$
$
$ cat abc.txt
NAME1 NAME2 ADDRESS TEL
A B ADD 1234
$
$
$ awk 'BEGIN{IFS="\t"; OFS="\t"} NR==1{$1="ID" IFS $1; $3="EMAIL" IFS $3} NR>1{$1=IFS $1; $3=IFS $3}1' abc.txt
ID NAME1 NAME2 EMAIL ADDRESS TEL
A B ADD 1234
$
$
$
To check that those are TAB characters and not blank spaces -
$
$ # -e option displays "$" at end of line; -t option displays TABs as ^I
$ cat -et abc.txt
NAME1^INAME2^IADDRESS^ITEL$
A^IB^IADD^I1234$
$
$
$ awk 'BEGIN{IFS="\t"; OFS="\t"} NR==1{$1="ID" IFS $1; $3="EMAIL" IFS $3} NR>1{$1=IFS $1; $3=IFS $3}1' abc.txt | cat -et
ID^INAME1^INAME2^IEMAIL^IADDRESS^ITEL$
^IA^IB^I^IADD^I1234$
$
$
$
tyler_durden
Hi Tyler,
Appreciate your prompt reply.
What if I don't know beforehand the columns that were missing? Will your code still work?
I was using for loop in my earlier posting to perform a column header name matching to try to find out 1) the missing column ( as this is not fixed)
2) the position of the column if it matches
I thought this is the way to go? U think u can help me?
No, it won't. The posted code -
(a) adds an "ID" column in front of first column
(b) adds an "Email" column right before the 3rd column
Since it performs simple substitution and does not check for column headers specifically, you'd have two "ID" columns if your first column actually happens to be an "ID" column !
The fact that you do not know what all columns could be missing is most significant and it makes your problem much more difficult.
Here's what I think -
(a) Firstly, you will have to know your "desired" column headers. You won't know what's missing unless you have a complete list for comparison.
(b) Secondly, there's a question about "gaps" as well as "order". Let's say your desired column header is "c1,c2,c3,c4,c5" and your file has "c1,c3,c5". The missing columns over here are "c2,c4", but your file still has columns in order.
Now, how do you want to handle the case when your file has columns "c4,c3,c1" ? There are missing columns, but the existing columns are out of order as well. Is this possible at all ? Do you want to change that to your "desired" order - i.e. fill the gaps as well as reorder ?
tyler_durden
Yes Tyler. There will be a desired coLumn header. And the script will have to fix gaps as well as order.
I don't know how to do this in a single unix script, is this even possible?
If not then I am thinking to use a macro tool to process the header, compare with the desired column header and order, then generate an awk command that will print the columns, like this:
1) macro read the column header and position
2) macro compare this with the desired column
3) macro generate awk print statement, ordering the position returned in (1) and where there is a gap, macro will generate print "\t"
4) pass the generated awk statement to unix to execute
Possible to do this entirely in unix?
Yes it's possible.
$
$ cat rearrange.awk
BEGIN {
IFS = "\t"; OFS = "\t" # set the field separators
num = split("ID,NAME1,NAME2,EMAIL,ADDRESS,TEL",main,",") # set the "main" array
}
{
if (NR==1) { # if we are at the first line
for (i=1; i<=NF; i++) { # then loop through the headers
for (j=1; j<=num; j++) { # and search for them in main array
if ($i == main[j]) {map = j; break} # if match found then
} # set "map" array and break
}
for (j=1; j<=num; j++) printf j==1 ? main[j] : OFS main[j] # print headers in correct order
printf "\n"
} else { # for line number > 1
for (i=1; i<=NF; i++) template[map] = $i # loop through fields; set "template"
for (j=1; j<=num; j++) printf j==1 ? template[j] : OFS template[j] # and simply print template
printf "\n"
}
}
$
$ cat abc.txt
NAME1 NAME2 ADDRESS TEL
A B ADD1 1234
X Y ADD2 5678
$
$
$ awk -f rearrange.awk abc.txt
ID NAME1 NAME2 EMAIL ADDRESS TEL
A B ADD1 1234
X Y ADD2 5678
$
$
$ cat abc1.txt
NAME1 ADDRESS
A ADD1
X ADD2
$
$ awk -f rearrange.awk abc1.txt
ID NAME1 NAME2 EMAIL ADDRESS TEL
A ADD1
X ADD2
$
$
$ cat abc2.txt
ADDRESS NAME2 TEL NAME1
ADD1 B 1234 A
ADD2 Y 5678 X
$
$
$ awk -f rearrange.awk abc2.txt
ID NAME1 NAME2 EMAIL ADDRESS TEL
A B ADD1 1234
X Y ADD2 5678
$
$
Sorry, my output data isn't aligned because the TABs are lost when I copy from my terminal emulator and paste it over here. And I am too lazy to add the TABs manually.
tyler_durden
Yoz Tyler, no problem with the TAB. Cant thank you enuf for the code already, how can i fuss over the formatting.
Anyway, I am executing your code, it seems like my system not able to parse these 2 lines correctly. Any idea?
printf j==1 ? main[j] : OFS main[j]
printf j==1 ? template[j] : OFS template[j]
Cheers
---------- Post updated at 09:55 AM ---------- Previous update was at 12:24 AM ----------
Hi Tyler,
I managed to work around this error by using if-else statement:
if (j==1)
printf main[j]
else
printf OFS main[j]
So the statement executed and printed OK now!
However, if abc.txt contains record with spaces, then it won't print correctly.
For example:
Field "USA SpringField" will print as "USA". I have no background in Unix but I think it is to be related to the printf statement.
Is it something i need to "tune" the printf to get it correctly? Let me google it in the meantime.
Thanks
Replace:
IFS="\t"
with
FS="\t"