Help with txt formatting using AWK

Hi,
Ive used unix.com to help learn the basics of AWK to format txt files however ive run out of talent and could do with some help. Im not sure if this is possible using awk but I have an input as follows

L73-10 342 0 1480
L73-10 342 100 1480
L73-10 342 250 1656
L73-10 342 500 1746
L73-10 342 750 1910
L73-10 350 0 1480
L73-10 350 100 1480
L73-10 350 250 1656
L73-11 300 0 1480
L73-11 300 100 1480
L73-11 300 250 1656
L73-11 300 500 1746

$1 is line, $2 is cdp, $3 is time, $4 is velocity

I need the script to print something like

L73-10/342:0-1480,100-1480,250-1656,500-1746,750-1910/350:0-1480,100-1480,250-1656/
 
L73-11/300:0-1480,100-1480,250-1656,500-1746/

The number of Time Velocity values is never constant between lines and cdps so im unsure if using something like below is the right way to go.

{NF==4
if (NR==1)
printf $1"/"$2":"$3"-"$4"," 
$2=cdp
{
if (NR>1||$2==cdp)
printf ORS=$3"-"$4"," 
else
$2!=cdp(NR=1)
next
{
if (NR==1)
$2=cdp
printf $1"/"$2":"$3"-"$4"," 
{
if (NR>1||$2==cdp)
printf ORS=$3"-"$4"," 
}
}
}
}

This gives an error saying cdp is not defined however I dont understand how. Do I need to create a loop to repeat the process to the end of the file or is an array more appropriate?

The above script also does not account for "line" variations. I have written something to print all line values to separate files so if needs be I can run a script on each file. Also it does not separate the boundary between new cdp's with a /.

Hopefully Ive got the point across ok. This level of problem is well beyond my ability and I have little idea of whether im using the correct approach of trying to reset NR when cdp changes. Any help would be greatly appreciated.

Thanks
Ryan

Use CODE tags when posting code, data or logs to enhance readability and to preserve formatting like indention etc., ty.

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

+++++++++++++++++++++++++++++++++++++++++++++++++++++

Try this:

awk '
s==$1{printf(",%s-%s", $3, $4);next}
NR>1{print "\\"}
{s=$1;printf("%s/%s:%s-%s", $1, $2, $3, $4)}
END{print "\\"}' file

Regards

Apologies Franklin52.

Your suggestion does not account for the change in cdp number. I think the output is the same as if I used

{NF==4
if (NR==1)
printf $1"/"$2":"$3"-"$4"," 
if (NR>1)
printf ORS=$3"-"$4"," 
}

Thanks
Ryan

Sorry, I've misread the question, try this one:

awk '
t != $2 && s==$1{t=$2;printf("/%s:%s-%s", $2, $3, $4);next}
s==$1{printf(",%s-%s", $3, $4);next}
NR>1{print "/"}
{s=$1;t=$2;printf("%s/%s:%s-%s", $1, $2, $3, $4)}
END{print "/"}' file

This is my output:

$ cat file
L73-10 342 0 1480
L73-10 342 100 1480
L73-10 342 250 1656
L73-10 342 500 1746
L73-10 342 750 1910
L73-10 350 0 1480
L73-10 350 100 1480
L73-10 350 250 1656
L73-11 300 0 1480
L73-11 300 100 1480
L73-11 300 250 1656
L73-11 300 500 1746
$ awk '
t != $2 && s==$1{t=$2;printf("/%s:%s-%s", $2, $3, $4);next}
s==$1{printf(",%s-%s", $3, $4);next}
NR>1{print "/"}
{s=$1;t=$2;printf("%s/%s:%s-%s", $1, $2, $3, $4)}
END{print "/"}' file
L73-10/342:0-1480,100-1480,250-1656,500-1746,750-1910/350:0-1480,100-1480,250-1656/
L73-11/300:0-1480,100-1480,250-1656,500-1746/
$

Regards

Franklin52,
Thanks very much. Ive spend so long trying to sort this.
Ryan

Using arrays in awk :

awk ' BEGIN { s=0;t=0 } NR==FNR { if($2==s&&$1==t) { a[$1]=a[$1]","$3"-"$4;t=$1;s=$2 } else { a[$1]=a[$1]"/"$2":"$3"-"$4;t=$1
;s=$2 } }
END { for ( i in a ) { print i a"/"} }' file_name.txt

Thanks Panyam,
Ive tried to understand both suggestions but could anybody briefly explain what the code is saying?
Thanks

Pls, go through this , as i am sure you will understand the usage.

http://www.grymoire.com/Unix/Awk.html

basically, i prefer to use Perl for this kind of issue. As perl has hash.

while(<DATA>){
	my @tmp=split;
	push @{$hash{$tmp[0]}->{$tmp[1]}}, $tmp[2]."-".$tmp[3];
}
foreach my $key(keys %hash){
	print $key,"/";
	foreach my $k(keys %{$hash{$key}}){
		print $k,":", join ",", @{$hash{$key}->{$k}};
		print "/";
	}
	print "\n\n";
}
__DATA__
L73-10 342 0 1480
L73-10 342 100 1480
L73-10 342 250 1656
L73-10 342 500 1746
L73-10 342 750 1910
L73-10 350 0 1480
L73-10 350 100 1480
L73-10 350 250 1656
L73-11 300 0 1480
L73-11 300 100 1480
L73-11 300 250 1656
L73-11 300 500 1746

Cheers Panyam,
I have used the linked txt previously. Although it seems pretty comprehensive I have found it quite hard to learn from. Perhaps it just me, with no previous knowledge of scripts and little computer aptitude, but I find it much easier to learn from an example. Although the link gives examples I find them hard to follow. I think I can "read" your solution but I am confused. Am I right in thinking that if the record number is zero for column 1 or 2 then

{ a[$1]=a[$1]","$3"-"$4;t=$1;s=$2 }

applies or else

{ a[$1]=a[$1]"/"$2":"$3"-"$4;t=$1
;s=$2 }

This would not give the output it does though. Have I misunderstood the record number condition?

Thanks again!

I was checking first for current header is equal to previous header or not , then the else part is if not ( i mean a L73-10 / L73-11 )

a[$1]=a[$1]","$3"-"$4;t=$1;s=$2 is ..nothing but maintaining an array of elements for a header with same second element value.

in the else part
i am appending the elements if the second element is not matched ( of course for the same header).

I am bit poor in explaining the things :slight_smile:

Thanks Panyam. I think I get it but Ive tried to modify the script to add a further condition but seem to be getting nowhere. I am trying to output as previously however within each series of $1 and $2 values if

NR=i ; $4=v and NR=i+1 ; $4<v

then I do not want to print anything where NR>=i.

So if input was

L73-10 342 0 1480
L73-10 342 100 1480
L73-10 342 250 1656
L73-10 342 500 1500
L73-10 342 750 1910
L73-10 350 0 1480
L73-10 350 100 1480
L73-10 350 250 1656
L73-11 300 0 1480
L73-11 300 100 1480
L73-11 300 250 1656
L73-11 300 500 1546

Only a formatted version of the following would be output

L73-10 342 0 1480
L73-10 342 100 1480
L73-10 342 250 1656
L73-10 350 0 1480
L73-10 350 100 1480
L73-10 350 250 1656
L73-11 300 0 1480
L73-11 300 100 1480
L73-11 300 250 1656

Can anybody help me out?

Thanks again
Ryan

---------- Post updated at 10:38 AM ---------- Previous update was at 06:28 AM ----------

I know this wont exclude all lines beyond a reversed $4 value but can anyone tell me what is wrong with this?

BEGIN { s=0;t=0;NR=i;vel=$4 } NR==FNR { if($2==s&&$1==t) { a[$1]=a[$1]","$3"-"$4;t=$1;s=$2 } else { a[$1]=a[$1]"/"$2":"$3"-"$4;t=$1;s=$2 }}
END {if(NR==i+1&&vel>$4) {for ( i in a ) { print i a"/"} }}

Also, is there any way of adding standard $3 $4 values say $3=6000 and $4=4500 to the end of every unique $1$2 set?

I hope ive explained this reasonably. Im a geophysicist with no experience of programming. I seem to spend hours trying variation after variation of code and have been trawing unix.com and google but seem to get lost all too often. Any help really would be appreciated!
Thanks
Ryan

Further modifying my code a bit :

awk ' BEGIN { s=0;t=0;v=0;flag1=0;} NR==FNR { if($2==s&&$1==t) { if(v > $4) { flag1=1 } if (flag1 != 1) { v=$4;a[$1]=
a[$1]","$3"-"$4;t=$1;s=$2 } } else { a[$1]=a[$1]"/"$2":"$3"-"$4;t=$1
;s=$2;flag1=0;v=$4 } }
END { for ( i in a ) { print i a"/"} }'  file_name.txt

Gives the solution.

Thanks very much Panyam!! Ive further modified your code to add dummy values for $3 and $4. I can do something myself!!
Cheers
Ryan