spaces to tabs - group with IP

hi buddies;

i have a file.txt:
Note: All the seperators are SPACE.

192.168.1.1
ParameterObject=1     Speech     1
ParameterObject=2     Speech     1

192.168.1.1
ParamFunction=1     UserID     1 (DEACTIVATED)
Sector=1,Device=2,Unit=3     DeviceId     1

192.168.1.1
FeederCable=2B     DelayTime i[15] = 1248 1248 1248
FeederCable=2C     DelayTime i[15] = 1248 1248 1248
FeederCable=2D     DelayTime i[15] = 1248 1248 1248

192.168.1.1

192.168.1.1

192.168.1.1
Function=1     MeanTime     i[24] = 0 0 0 0 0

192.168.1.1
FileLocation=1     Address     /c/user/

...
...

192.168.2.1
ParameterObject=1     Speech     2
ParameterObject=2     Speech     2

192.168.2.1
ParamFunction=1     UserID     1 (DEACTIVATED)
Sector=1,Device=2,Unit=3     DeviceId     1

192.168.2.1
FeederCable=2A     DelayTime i[25] = 3248 1248 1248
FeederCable=2A     DelayTime i[25] = 3248 1248 1248
FeederCable=2A     DelayTime i[25] = 3248 1248 1248

192.168.2.1

192.168.2.1
Function=1     MeanTime     i[24] = 0 0 0 0 0

192.168.2.1
FileLocation=1     Address     /c/user/

192.168.2.1

...

i need help to;

  • search lines under same IP and get them. (values are seperated by SPACE and some IP lines have no attributes)
  • write it as (values must be seperated by TABS:
IP1\tab$1\tab$2\tab$RemainingEverything...
\tab$1\tab$2\tab$RemainingEverything...

IP2\tab$1\tab$2\tab$RemainingEverything...
\tab\$1\tab$2\tab$RemainingEverything...

so; i want to organise this file as:


192.168.1.1	ParameterObject=1	Speech	1
	ParameterObject=2	Speech	1
	ParamFunction=1	UserID	1 (DEACTIVATED)
	Sector=1,Device=2,Unit=3	DeviceId	1
	FeederCable=2B	DelayTime	i[15] = 1248 1248 1248
	FeederCable=2C	DelayTime	i[15] = 1248 1248 1248
	FeederCable=2D	DelayTime	i[15] = 1248 1248 1248
	Function=1	MeanTime	i[24] = 0 0 0 0 0
	FileLocation=1	Address	/c/user/

...
...

192.168.2.1	ParameterObject=1	Speech	2
	ParameterObject=2	Speech	2
	ParamFunction=1	UserID	1 (DEACTIVATED)
	Sector=1,Device=2,Unit=3	DeviceId	1
	FeederCable=2A	DelayTime	i[25] = 3248 1248 1248
	FeederCable=2A	DelayTime	i[25] = 3248 1248 1248
	FeederCable=2A	DelayTime	i[25] = 3248 1248 1248
	Function=1	MeanTime	i[24] = 0 0 0 0 0
	FileLocation=1	Address	/c/user/

...
...

hope to tell it clearly. it's quite complex :frowning:

Since I like PERL:

#! /usr/bin/perl

use strict;
use warnings;

my $ip = undef;
my %X  = ();

$\ = "\n";
$, = '';

while (<>) {
    chomp;
    next if m{^\s*$};
    if (m{^\d+\.\d+\.\d+\.\d+$}) { $ip = $_; next; }
    push @{$X{$ip}}, $_;
}

$, = "\t";
$" = ' ';

while (my ($ip, $X) = each %X) {
    foreach my $line (@{$X}) {
        my @L = split /\s+/, $line;
        my $a = shift @L;
        my $b = shift @L;

        print $ip, $a, $b, "@L";
        $ip = '';
    }
}

which results in

192.168.2.1    ParameterObject=1    Speech    2
    ParameterObject=2    Speech    2
    ParamFunction=1    UserID    1 (DEACTIVATED)
    Sector=1,Device=2,Unit=3    DeviceId    1
    FeederCable=2A    DelayTime    i[25] = 3248 1248 1248
    FeederCable=2A    DelayTime    i[25] = 3248 1248 1248
    FeederCable=2A    DelayTime    i[25] = 3248 1248 1248
    Function=1    MeanTime    i[24] = 0 0 0 0 0
    FileLocation=1    Address    /c/user/
192.168.1.1    ParameterObject=1    Speech    1
    ParameterObject=2    Speech    1
    ParamFunction=1    UserID    1 (DEACTIVATED)
    Sector=1,Device=2,Unit=3    DeviceId    1
    FeederCable=2B    DelayTime    i[15] = 1248 1248 1248
    FeederCable=2C    DelayTime    i[15] = 1248 1248 1248
    FeederCable=2D    DelayTime    i[15] = 1248 1248 1248
    Function=1    MeanTime    i[24] = 0 0 0 0 0
    FileLocation=1    Address    /c/user/

Something like this,

 awk '!/^$/{if(/^[0-9]../){if(a[$1]) {next;} else {a[$1]++;printf "\n";printf $0 FS}} else {printf "\t" $0 "\n"}}' inputfile

@pravin and @ludwig;

both your solutions are OK, thanks. but actually the results are seperated by SPACE. i just need TABS to seperate first two values. could you please just show how replace those SPACEs by TABs. i think we can use another code to do it.

the problem now is, replacing your output file:

192.168.2.1\tParameterObject=1\spaceSpeech\space2
\tParameterObject=2\spaceSpeech\space2
\tParamFunction=1\spaceUserID\space1 (DEACTIVATED)
\tSector=1,Device=2,Unit=3\spaceDeviceId\space1
\tFeederCable=2A\spaceDelayTime\spacei[25] = 3248 1248 1248
\tFeederCable=2A\spaceDelayTime\spacei[25] = 3248 1248 1248
\tFeederCable=2A\spaceDelayTime\spacei[25] = 3248 1248 1248
\tFunction=1\spaceMeanTime\spacei[24] = 0 0 0 0 0
\tFileLocation=1\spaceAddress\space/c/user/
192.168.1.1\tParameterObject=1\spaceSpeech\space1
\tParameterObject=2\spaceSpeech\space1
\tParamFunction=1\spaceUserID\space1 (DEACTIVATED)
\tSector=1,Device=2,Unit=3\spaceDeviceId\space1
\tFeederCable=2B\spaceDelayTime\spacei[15] = 1248 1248 1248
\tFeederCable=2C\spaceDelayTime\spacei[15] = 1248 1248 1248
\tFeederCable=2D\spaceDelayTime\spacei[15] = 1248 1248 1248
\tFunction=1\spaceMeanTime\spacei[24] = 0 0 0 0 0
\tFileLocation=1\spaceAddress\space/c/user/

with this one:

192.168.2.1\tParameterObject=1\tSpeech\t2
\tParameterObject=2\tSpeech\t2
\tParamFunction=1\tUserID\t1 (DEACTIVATED)
\tSector=1,Device=2,Unit=3\tDeviceId\t1
\tFeederCable=2A\tDelayTime\ti[25] = 3248 1248 1248
\tFeederCable=2A\tDelayTime\ti[25] = 3248 1248 1248
\tFeederCable=2A\tDelayTime\ti[25] = 3248 1248 1248
\tFunction=1\tMeanTime\ti[24] = 0 0 0 0 0
\tFileLocation=1\tAddress\t/c/user/
192.168.1.1\tParameterObject=1\tSpeech\t1
\tParameterObject=2\tSpeech\t1
\tParamFunction=1\tUserID\t1 (DEACTIVATED)
\tSector=1,Device=2,Unit=3\tDeviceId\t1
\tFeederCable=2B\tDelayTime\ti[15] = 1248 1248 1248
\tFeederCable=2C\tDelayTime\ti[15] = 1248 1248 1248
\tFeederCable=2D\tDelayTime\ti[15] = 1248 1248 1248
\tFunction=1\tMeanTime\ti[24] = 0 0 0 0 0
\tFileLocation=1\tAddress\t/c/user/

Try this,

awk '!/^$/{if(/^[0-9]../){if(a[$1]) {next;} else {a[$1]++;printf $0}} else {printf "\t" $0 "\n"}}' inputfile | tr -s " " "\t" 

pravin, this is converting ALL spaces to tabs. i donot need tab after 3rd.

look again:

192.168.2.1\tParameterObject=1\tSpeech\t2
\tParameterObject=2\tSpeech\t2
\tParamFunction=1\tUserID\t1 (DEACTIVATED)
\tSector=1,Device=2,Unit=3\tDeviceId\t1
\tFeederCable=2A\tDelayTime\ti[25] = 3248 1248 1248
\tFeederCable=2A\tDelayTime\ti[25] = 3248 1248 1248
\tFeederCable=2A\tDelayTime\ti[25] = 3248 1248 1248
\tFunction=1\tMeanTime\ti[24] = 0 0 0 0 0
\tFileLocation=1\tAddress\t/c/user/
192.168.1.1\tParameterObject=1\tSpeech\t1
\tParameterObject=2\tSpeech\t1
\tParamFunction=1\tUserID\t1 (DEACTIVATED)
\tSector=1,Device=2,Unit=3\tDeviceId\t1
\tFeederCable=2B\tDelayTime\ti[15] = 1248 1248 1248
\tFeederCable=2C\tDelayTime\ti[15] = 1248 1248 1248
\tFeederCable=2D\tDelayTime\ti[15] = 1248 1248 1248
\tFunction=1\tMeanTime\ti[24] = 0 0 0 0 0
\tFileLocation=1\tAddress\t/c/user/
awk 'NF==1&&p!=$1{printf "\n%s",p=$1} NF>1{$1=$1;sub($1" "$2" ","\t"$1"\t"$2"\t");print}' file

Try this,

awk '!/^$/{if(/^[0-9]../){if(a[$1]) {next;} else {a[$1]++;printf $0}} else {for(i=1;i<=NF;i++){if(i<=3){printf "\t" $i}else {printf $i} if(i>3 || i==NF){printf i==NF?RS:FS}}}}' infile

@Scrutinizer;

Thanks but there shouldnot be TAB on IP line. Look at my post above. your code is putting TAB to everyline. The line starting with IP wouldnot have any TAB. could you please make an exception for this situation?

---------- Post updated at 14:26 ---------- Previous update was at 14:21 ----------

@pravin27, it is giving syntax error :frowning: :

nawk: sytnax error at source line 1
context is ...
nawk: illegal statement at source line 1

Here you seem to be suggesting using tabs on the IP line, no?

Hi,

It's working fine for me ...

awk '!/^$/{if(/^[0-9]../){if(a[$1]) {next;} else {a[$1]++;printf $0}} else {for(i=1;i<=NF;i++){if(i<=3){printf "\t" $i}else {printf $i} if(i>3 || i==NF){printf i==NF?RS:FS}}}}' input_file

O/P

192.168.1.1     ParameterObject=1       Speech  1
        ParameterObject=2       Speech  1
        ParamFunction=1 UserID  1(DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2B  DelayTime       i[15]= 1248 1248 1248
        FeederCable=2C  DelayTime       i[15]= 1248 1248 1248
        FeederCable=2D  DelayTime       i[15]= 1248 1248 1248
        Function=1      MeanTime        i[24]= 0 0 0 0 0
        FileLocation=1  Address /c/user/
192.168.2.1     ParameterObject=1       Speech  2
        ParameterObject=2       Speech  2
        ParamFunction=1 UserID  1(DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        Function=1      MeanTime        i[24]= 0 0 0 0 0
        FileLocation=1  Address /c/user/

@Scrutinizer;
your code, resulting below, putting tab to the beginning of IP line. IP line wouldnot start with tab (blue tab shouldnot be there) :(.

\t192.168.2.1\tParameterObject=1\tSpeech\t2
\tParameterObject=2\tSpeech\t2
...

I admit that my proposed solution may not be the shortest one offered (the one-liner is quite good, btw) , but I did not assume that the i/p-addresses would be in order. But my proposed solution does put tabs between the first two values, as shown here:

192.168.2.1\tabParameterObject=1\tabSpeech\tab2$
\tabParameterObject=2\tabSpeech\tab2$
\tabParamFunction=1\tabUserID\tab1 (DEACTIVATED)$
\tabSector=1,Device=2,Unit=3\tabDeviceId\tab1$
\tabFeederCable=2A\tabDelayTime\tabi[25] = 3248 1248 1248$
\tabFeederCable=2A\tabDelayTime\tabi[25] = 3248 1248 1248$
\tabFeederCable=2A\tabDelayTime\tabi[25] = 3248 1248 1248$
\tabFunction=1\tabMeanTime\tabi[24] = 0 0 0 0 0$
\tabFileLocation=1\tabAddress\tab/c/user/$
192.168.1.1\tabParameterObject=1\tabSpeech\tab1$
\tabParameterObject=2\tabSpeech\tab1$
\tabParamFunction=1\tabUserID\tab1 (DEACTIVATED)$
\tabSector=1,Device=2,Unit=3\tabDeviceId\tab1$
\tabFeederCable=2B\tabDelayTime\tabi[15] = 1248 1248 1248$
\tabFeederCable=2C\tabDelayTime\tabi[15] = 1248 1248 1248$
\tabFeederCable=2D\tabDelayTime\tabi[15] = 1248 1248 1248$
\tabFunction=1\tabMeanTime\tabi[24] = 0 0 0 0 0$
\tabFileLocation=1\tabAddress\tab/c/user/$

If you look at my proposed solution:

In line 1., the output field separator is set to tab, and in line 2, the list separator character, used when a list is interpolated by a double-quoted string, is set to space. So in line 5, each input line is split into words. The first two are removed from the list in lines 6 and 7 (yes, I could have used slices, but I was trying to be obvious), and then the whole line is printed on line 8, with tabs between the i/p-address, the first word, the second word, and the rest of the list. The rest of the list itself will be space separated.

I don't think it does actually.. When I am running it there is not \t in front of the IP.

@Scrutinizer;

i have tried your solution so many times but still same :wall: The output of your result is:

	192.168.1.1     ParameterObject=1       Speech  1
        ParameterObject=2       Speech  1
        ParamFunction=1 UserID  1(DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2B  DelayTime       i[15]= 1248 1248 1248
        FeederCable=2C  DelayTime       i[15]= 1248 1248 1248
        FeederCable=2D  DelayTime       i[15]= 1248 1248 1248
        Function=1      MeanTime        i[24]= 0 0 0 0 0
        FileLocation=1  Address /c/user/
	192.168.2.1     ParameterObject=1       Speech  2
        ParameterObject=2       Speech  2
        ParamFunction=1 UserID  1(DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        Function=1      MeanTime        i[24]= 0 0 0 0 0
        FileLocation=1  Address /c/user/

but i need:

192.168.1.1     ParameterObject=1       Speech  1
        ParameterObject=2       Speech  1
        ParamFunction=1 UserID  1(DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2B  DelayTime       i[15]= 1248 1248 1248
        FeederCable=2C  DelayTime       i[15]= 1248 1248 1248
        FeederCable=2D  DelayTime       i[15]= 1248 1248 1248
        Function=1      MeanTime        i[24]= 0 0 0 0 0
        FileLocation=1  Address /c/user/
192.168.2.1     ParameterObject=1       Speech  2
        ParameterObject=2       Speech  2
        ParamFunction=1 UserID  1(DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        FeederCable=2A  DelayTime       i[25]= 3248 1248 1248
        Function=1      MeanTime        i[24]= 0 0 0 0 0
        FileLocation=1  Address /c/user/

will you please write a code that says:

  • delete just a first character (it is \t) of line starting with IP.

the most similar result to mine is "your" solution. just a piece of small code is enough to do it..

thx :slight_smile:

I am really puzzled where that TAB would come from because my solution does not write it, so there is nothing to remove. Are you on Solaris and do you need to use awk? Could you post your an input file sample as an example?

$ awk 'NF==1&&p!=$1{printf "\n%s",p=$1} NF>1{$1=$1;sub($1" "$2" ","\t"$1"\t"$2"\t");print}' infile

192.168.1.1     ParameterObject=1       Speech  1
        ParameterObject=2       Speech  1
        ParamFunction=1 UserID  1 (DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2B  DelayTime       i[15] = 1248 1248 1248
        FeederCable=2C  DelayTime       i[15] = 1248 1248 1248
        FeederCable=2D  DelayTime       i[15] = 1248 1248 1248
        Function=1      MeanTime        i[24] = 0 0 0 0 0
        FileLocation=1  Address /c/user/

192.168.2.1     ParameterObject=1       Speech  2
        ParameterObject=2       Speech  2
        ParamFunction=1 UserID  1 (DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2A  DelayTime       i[25] = 3248 1248 1248
        FeederCable=2A  DelayTime       i[25] = 3248 1248 1248
        FeederCable=2A  DelayTime       i[25] = 3248 1248 1248
        Function=1      MeanTime        i[24] = 0 0 0 0 0
        FileLocation=1  Address /c/user/

---------- Post updated at 16:08 ---------- Previous update was at 16:04 ----------

Does this give a different result?

awk '/^[0-9]/&&p!=$1{printf "\n%s",p=$1} NF>1{$1=$1;sub($1" "$2" ","\t"$1"\t"$2"\t");print}' infile

yes, i am on Solaris and i am making your results to nawk ... . since your solutions are always with awk , i am using awk .

i have already posted my exact file to you.

your last solution gave:

192.168.1.1	192.168.1.1     ParameterObject=1       Speech  1
        ParameterObject=2       Speech  1
        ParamFunction=1 UserID  1 (DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2B  DelayTime       i[15] = 1248 1248 1248
        FeederCable=2C  DelayTime       i[15] = 1248 1248 1248
        FeederCable=2D  DelayTime       i[15] = 1248 1248 1248
        Function=1      MeanTime        i[24] = 0 0 0 0 0
        FileLocation=1  Address /c/user/

192.168.2.1	192.168.2.1     ParameterObject=1       Speech  2
        ParameterObject=2       Speech  2
        ParamFunction=1 UserID  1 (DEACTIVATED)
        Sector=1,Device=2,Unit=3        DeviceId        1
        FeederCable=2A  DelayTime       i[25] = 3248 1248 1248
        FeederCable=2A  DelayTime       i[25] = 3248 1248 1248
        FeederCable=2A  DelayTime       i[25] = 3248 1248 1248
        Function=1      MeanTime        i[24] = 0 0 0 0 0
        FileLocation=1  Address /c/user/

it has repeated IP.

On Solaris you should always use nawk or /usr/xpg4/bin/awk . Do they give the same results?

---------- Post updated at 16:36 ---------- Previous update was at 16:33 ----------

Try this alternative:

awk '/^[0-9]/&&p!=$1{printf "\n%s",p=$1;next} NF>1{$1=$1;sub($1" "$2" ","\t"$1"\t"$2"\t");print}' infile

Can it be that you have CRLF terminated file? What happens when you convert the file from DOS to Unix format? That would mess up the NF variable...

ok Scruti. it is ok now. i have realized my mistake. i had pointed wrong file :slight_smile: you have a perfect solution as always. this is the code:

nawk '/^[0-9]/&&p!=$1{printf "\n%s",p=$1} NF>1{$1=$1;sub($1" "$2" ","\t"$1"\t"$2"\t");print}' file.txt

thanks very much :slight_smile: