sort data in different columns

mogabr · August 2, 2008, 7:13pm

Hello all:

i have list with the following format
Id Name Iid Value
0x4440001 customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 usid fi00bxtaa4
0x4440008 customerName Zurich

while i needed to be in the next format
customerCode PrimaryAddress Model_Handle usid
44077 55.217.41.201 0x11322800 fi00bxtaa4

how to do so ?

regards

cfajohnson · August 2, 2008, 8:16pm

awk ' NR == 1 { next }
{
 name[NR-1] = $2
 value[NR-1] = $3
}
END {
 while ( ++n < NR - 1 ) printf "%s ", name[n]
 print ""
 while ( ++v < NR - 1 ) printf "%s ", value[v]
 print ""
}' "$FILE"

mogabr · August 3, 2008, 5:01am

cfajohnson:

awk ' NR == 1 { next }
{
 name[NR-1] = $2
 value[NR-1] = $3
}
END {
 while ( ++n < NR - 1 ) printf "%s ", name[n]
 print ""
 while ( ++v < NR - 1 ) printf "%s ", value[v]
 print ""
}' "$FILE"

hello

but the input data will be repeated in the file in that way
Id Name Iid Value
0x4440001 customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 usid fi00bxtaa4
0x4440008 customerName Zurich
Id Name Iid Value
0x4440001 customerCode 44044
0x11d2a PrimaryAddress 57.210.40.20
0x129fa Model_Handle 0x11326500
0x4440000 usid fi00bxtbb0
0x4440008 customerName Zurich

and so on ...

aigles · August 3, 2008, 5:46am

Try and adapt the following script.
The variable NamesList contains all the required names in output.

awk -v NamesList="customerCode,PrimaryAddress,Model_Handle,usid" \
'

function output_datas(    i, out) {
   if (Valid) {
      for (i=1; i<=NamesCount; i++) {
         out = (out ? out OFS : "") Values;
      }
      print out;
   }
}

function reset_datas(    i) {
   Valid = "";
   for (i=1; i<=NamesCount; i++)
      Values = "";
}

BEGIN {
   NamesCount = split(NamesList, Names, ",");
   Header = "";
   for (i=1; i<=NamesCount; i++) {
      Indexes[tolower(Names)] = i;
      Header = (Header ? Header OFS : "") Names;
   }
   print Header;
}

/^Id/ {
   output_datas();
   reset_datas();
   next;
}

{
   name = tolower($2);
   value = $3;
   if (name in Indexes)  {
      Values[Indexes[name]] = value;
      Valid++;
   }
}

END {
   output_datas();
}
' mogarb.dat

Input datas (mogarb.dat)

Id Name Iid Value
0x4440001 customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 usid fi00bxtaa4
0x4440008 customerName Zurich
Id Name Iid Value
0x4440001 customerCode 44044
0x11d2a PrimaryAddress 57.210.40.20
0x129fa Model_Handle 0x11326500
0x4440000 usid fi00bxtbb0
0x4440008 customerName Zurich

Output

customerCode PrimaryAddress Model_Handle usid
44077 57.217.41.201 0x11322800 fi00bxtaa4
44044 57.210.40.20 0x11326500 fi00bxtbb0

Jean-Pierre.

radoulov · August 3, 2008, 6:34am

I'm not sure if you want to sort or to change the format only:
(use nawk or /usr/xpg4/bin/awk on Solaris)

awk 'BEGIN { 
  print "customerCode PrimaryAddress Model_Handle usid" 
  }
c && c < 5 { 
  v = v ? v FS $NF : $NF 
  } 
++c == 6 { 
  print v
  v = c = "" 
  }' filename

mogabr · August 3, 2008, 6:52am

aigles:

Try and adapt the following script.
The variable NamesList contains all the required names in output.

awk -v NamesList="customerCode,PrimaryAddress,Model_Handle,usid" \
'
 
function output_datas(    i, out) {
   if (Valid) {
   for (i=1; i<=NamesCount; i++) {
   out = (out ? out OFS : "") Values;
   }
   print out;
   }
}
 
function reset_datas(    i) {
   Valid = "";
   for (i=1; i<=NamesCount; i++)
   Values = "";
}
 
BEGIN {
   NamesCount = split(NamesList, Names, ",");
   Header = "";
   for (i=1; i<=NamesCount; i++) {
   Indexes[tolower(Names)] = i;
   Header = (Header ? Header OFS : "") Names;
   }
   print Header;
}
 
/^Id/ {
   output_datas();
   reset_datas();
   next;
}
 
{
   name = tolower($2);
   value = $3;
   if (name in Indexes)  {
   Values[Indexes[name]] = value;
   Valid++;
   }
}
 
END {
   output_datas();
}
' mogarb.dat

Input datas (mogarb.dat)

Id Name Iid Value
0x4440001 customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 usid fi00bxtaa4
0x4440008 customerName Zurich
Id Name Iid Value
0x4440001 customerCode 44044
0x11d2a PrimaryAddress 57.210.40.20
0x129fa Model_Handle 0x11326500
0x4440000 usid fi00bxtbb0
0x4440008 customerName Zurich

Output

customerCode PrimaryAddress Model_Handle usid
44077 57.217.41.201 0x11322800 fi00bxtaa4
44044 57.210.40.20 0x11326500 fi00bxtbb0

Jean-Pierre.

Hello Jean

when run script getting error
awk: syntax error near line 1
awk: bailing out near line 1
please advice

mogabr · August 3, 2008, 6:56am

radoulov:

I'm not sure if you want to sort or to change the format only:
(use nawk or /usr/xpg4/bin/awk on Solaris)
awk 'BEGIN { 
  print "customerCode PrimaryAddress Model_Handle usid" 
  }
c && c < 5 { 
  v = v ? v FS $NF : $NF 
  } 
++c == 6 { 
  print v
  v = c = "" 
  }' filename

getting the same error
awk: syntax error near line 4
awk: bailing out near line 4

aigles · August 3, 2008, 7:14am

Try with nawk or gawk instead of awk

Jean-Pierre.

radoulov · August 3, 2008, 7:35am

Did you try nawk and /usr/xpg4/bin/awk as suggested?

mogabr · August 3, 2008, 9:26am

i have tried on this script
nawk -v NamesList="customerCode,PrimaryAddress,Model_Handle,usid" \
'

function output_datas( i, out) {
if (Valid) {
for (i=1; i<=NamesCount; i++) {
out = (out ? out OFS : "") Values[i];
}
print out;
}
}

function reset_datas( i) {
Valid = "";
for (i=1; i<=NamesCount; i++)
Values [i]= "";
}

BEGIN {
NamesCount = split(NamesList, Names, ",");
Header = "";
for (i=1; i<=NamesCount; i++) {
Indexes[tolower(Names[i])] = i;
Header = (Header ? Header OFS : "") Names[i];
}
print Header;
}

/^Id/ {
output_datas();
reset_datas();
next;
}

{
name = tolower($2);
value = $3;
if (name in Indexes) {
Values[Indexes[name]] = value;
Valid++;
}
}

END {
output_datas();
}
' list_of_customer_number

the out put was in wrong fromat

customerCode PrimaryAddress Model_Handle usid
57.217.41.201 0x11322800
57.215.49.14 0x113fa800
57.219.48.83 0x11395800
57.219.48.47 0x11389000
57.219.48.140 0x11384000
57.0.144.159 0x1131f000
57.217.46.212 0x11303489

also i need to put the output in a file.. how to fix above ?

regards

aigles · August 3, 2008, 9:35am

The output can be redirected to a file :

awk awk_script inputfile > outputfile

.

Please show us the content of your input file.
Rarmark: The names in the NamesList variable must the values that are specified in the field two of the input records.

Id Name Iid Value
0x4440001 customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 usid fi00bxtaa4
0x4440008 customerName Zurich

Jean-Pierre.

mogabr · August 3, 2008, 9:46am

aigles:

The output can be redirected to a file :
awk awk_script inputfile > outputfile
.

Please show us the content of your input file.
Rarmark: The names in the NamesList variable must the values that are specified in the field two of the input records.
Id Name Iid Value
0x4440001 customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 usid fi00bxtaa4
0x4440008 customerName Zurich
Jean-Pierre.

Hello Jean

the content of the file is too big but the infromation repeated as following

Id Name Iid Value
0x4440001 eq_customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 eq_usid fi00bxtaa4
0x4440008 eq_customerName Zurich Financial Services
Id Name Iid Value
0x4440001 eq_customerCode 55487
0x11d2a PrimaryAddress 57.215.49.14
0x129fa Model_Handle 0x113fa800
0x4440000 eq_usid fi00dlzuc8
0x4440008 eq_customerName T-Systems Business Services Gmbh
Id Name Iid Value
0x4440001 eq_customerCode 1908
0x11d2a PrimaryAddress 57.219.48.83
0x129fa Model_Handle 0x11395800
0x4440000 eq_usid fi00kgy094
0x4440008 eq_customerName Akzo Nobel Central Purchasing Bv

the needed columns will be like following
eq_customerCode PrimaryAddress Model_Handle eq_usid eq_customerName

regards

aigles · August 3, 2008, 10:04am

You must modify the NamesList variable.
Another problem is that the values can be spanned over more than one field.

A new version of the script (changes have been colorized) :

awk -v NamesList="eq_customerCode,PrimaryAddress,Model_Handle,eq_usid,eq_customerName" \
'

function output_datas(    i, out) {
   if (Valid) {
      for (i=1; i<=NamesCount; i++) {
         out = (out ? out OFS : "") Values;
      }
      print out;
   }
}

function reset_datas(    i) {
   Valid = "";
   for (i=1; i<=NamesCount; i++)
      Values = "";
}

BEGIN {
   NamesCount = split(NamesList, Names, ",");
   Header = "";
   for (i=1; i<=NamesCount; i++) {
      Indexes[tolower(Names)] = i;
      Header = (Header ? Header OFS : "") Names;
   }
   print Header;
}

/^Id/ {
   output_datas();
   reset_datas();
   next;
}

{
   name = tolower($2);
   if (name in Indexes)  {
      value = "";
      for (i=3; i<=NF; i++)
         value = value " " $i
      Values[Indexes[name]] = substr(value, 2);
      Valid++;
   }
}

END {
   output_datas();
}
' mogarb2.dat

Input file (mogargb2.dat)

Id Name Iid Value
0x4440001 eq_customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 eq_usid fi00bxtaa4
0x4440008 eq_customerName Zurich Financial Services
Id Name Iid Value
0x4440001 eq_customerCode 55487
0x11d2a PrimaryAddress 57.215.49.14
0x129fa Model_Handle 0x113fa800
0x4440000 eq_usid fi00dlzuc8
0x4440008 eq_customerName T-Systems Business Services Gmbh
Id Name Iid Value
0x4440001 eq_customerCode 1908
0x11d2a PrimaryAddress 57.219.48.83
0x129fa Model_Handle 0x11395800
0x4440000 eq_usid fi00kgy094
0x4440008 eq_customerName Akzo Nobel Central Purchasing Bv

Output:

eq_customerCode PrimaryAddress Model_Handle eq_usid eq_customerName
44077 57.217.41.201 0x11322800 fi00bxtaa4 Zurich Financial Services
55487 57.215.49.14 0x113fa800 fi00dlzuc8 T-Systems Business Services Gmbh
1908 57.219.48.83 0x11395800 fi00kgy094 Akzo Nobel Central Purchasing Bv

mogabr · August 3, 2008, 4:42pm

aigles:

You must modify the NamesList variable.
Another problem is that the values can be spanned over more than one field.

A new version of the script (changes have been colorized) :

awk -v NamesList="eq_customerCode,PrimaryAddress,Model_Handle,eq_usid,eq_customerName" \
'
 
function output_datas(    i, out) {
   if (Valid) {
   for (i=1; i<=NamesCount; i++) {
   out = (out ? out OFS : "") Values;
   }
   print out;
   }
}
 
function reset_datas(    i) {
   Valid = "";
   for (i=1; i<=NamesCount; i++)
   Values = "";
}
 
BEGIN {
   NamesCount = split(NamesList, Names, ",");
   Header = "";
   for (i=1; i<=NamesCount; i++) {
   Indexes[tolower(Names)] = i;
   Header = (Header ? Header OFS : "") Names;
   }
   print Header;
}
 
/^Id/ {
   output_datas();
   reset_datas();
   next;
}
 
{
   name = tolower($2);
   if (name in Indexes)  {
   value = "";
   for (i=3; i<=NF; i++)
   value = value " " $i
   Values[Indexes[name]] = substr(value, 2);
   Valid++;
   }
}
 
END {
   output_datas();
}
' mogarb2.dat

Input file (mogargb2.dat)

Id Name Iid Value
0x4440001 eq_customerCode 44077
0x11d2a PrimaryAddress 57.217.41.201
0x129fa Model_Handle 0x11322800
0x4440000 eq_usid fi00bxtaa4
0x4440008 eq_customerName Zurich Financial Services
Id Name Iid Value
0x4440001 eq_customerCode 55487
0x11d2a PrimaryAddress 57.215.49.14
0x129fa Model_Handle 0x113fa800
0x4440000 eq_usid fi00dlzuc8
0x4440008 eq_customerName T-Systems Business Services Gmbh
Id Name Iid Value
0x4440001 eq_customerCode 1908
0x11d2a PrimaryAddress 57.219.48.83
0x129fa Model_Handle 0x11395800
0x4440000 eq_usid fi00kgy094
0x4440008 eq_customerName Akzo Nobel Central Purchasing Bv

Output:

eq_customerCode PrimaryAddress Model_Handle eq_usid eq_customerName
44077 57.217.41.201 0x11322800 fi00bxtaa4 Zurich Financial Services
55487 57.215.49.14 0x113fa800 fi00dlzuc8 T-Systems Business Services Gmbh
1908 57.219.48.83 0x11395800 fi00kgy094 Akzo Nobel Central Purchasing Bv

Hello

thanks a lot for ur help but awk is not working so i am using nawk instead ..
IT WORKS GREAT NOW ... THANKS VERY MUCH FOR UR HELP

regards

summer_cherry · August 3, 2008, 9:57pm

1> make a file ( a.txt )with only last two columns

cut -d" " -f2,3

2> treat (a.txt) as a matrix, transpose it in perl:

sub ChangeMetrix
{
        if ($#_<1){
                print "Usage: ChangeMetrix filename delimeter\n";
                exit;
        }
        $file=shift;
        $del=shift;
        open(FH,"<$file");
        while(<FH>){
                $_=~tr/\n//d;
                @arr=split($del,$_);
                if($#arr>$col){
                        $col=$#arr;
                }
                for($i=0;$i<=$#arr;$i++){
                        $index=sprintf("%s%s",$.,$i);
                        $hash{$index}=$arr[$i];
                }
                $row=$.;

        }
        close(FH);
        for($i=0;$i<=$col;$i++){
                for($j=1;$j<=$row;$j++){
                        $temp=sprintf("%s%s",$j,$i);
                        if(exists $hash{$temp}){
                                print $hash{$temp},$del;
                        }
                        else{
                                print "_____",$del;
                        }
                }
                print "\n";
        }

}

davenorm · August 4, 2008, 6:07am

or just knife and fork it with unix commands...
(note 'list.file' is a file containing the aforementioned list as input - it would also work if list was just streamed as stdin rather than read as a file)

grep customerCode 'list.file' > tmp1
grep PrimaryAddress 'list.file' > tmp2
grep Model_Handle 'list.file' > tmp3
grep usid 'list.file' > tmp4

sed 's/^.*customerCode //g' tmp1 > tmp1a
sed 's/^.*PrimaryAddress //g' tmp2 > tmp2a
sed 's/^.*Handle //g' tmp3 > tmp3a
sed 's/^.*usid //g' tmp4 > tmp4a

paste -d" " tmp1a tmp2a tmp3a tmp4a