Gents,
I have a file which contends duplicate records in column 1, but the values in column 2 are different.
3099753489 3
3099753489 5
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
I will like to get something like this:
output desired
3099753489 3 5
3101954341 12 14
3102153285 3 5
3102153297 3 5
I am trying with this code but does not work.
awk '{
D[$1]}{key[$1,$2]++}
END{
for (i in key) {
split (i, T, SUBSEP)
print T[1],key, T[2]}}' file
Please can you help me.
RudiC
August 19, 2016, 5:30am
2
Is yout input ALWAYS two records? In sequence?
1 Like
Hello jiam912,
Considering that there are always 2 fields into your Input_file then following may help in same.
I- If you are not worried about the output sequence as like Input_file's sequence.
awk '{A[$1]=A[$1]?A[$1] OFS $2:$2} END{for(i in A){print i OFS A}}' Input_file
Output will be as follows.
3099753489 3 5
3102153285 3 5
3101954341 12 14
3102153297 3 5
II- If you need output in sequence as Input_file, then following may help you in same.
awk 'FNR==NR{A[$1]=A[$1]?A[$1] OFS $2:$2;next} ($1 in A){print $1 OFS A[$1];delete A[$1]}' Input_file Input_file
Output will be as follows.
3099753489 3 5
3101954341 12 14
3102153285 3 5
3102153297 3 5
Thanks,
R. Singh
1 Like
Hi RudiC.
It can be sometimes more than 2 records in secuence
---------- Post updated at 05:14 AM ---------- Previous update was at 05:13 AM ----------
Hi RavinderSingh13.
Thanks a lot
Hello jiam912,
In case you have more than 2 fields into your Input_file then following may help you in same, let's say following is the Input_file.
cat Input_file
3099753489 3
3099753489 5
3101954341 12 21 31 34 56 78
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
Then following is the code for same.
awk 'FNR==NR{if(NF>2){for(i=3;i<=NF;i++){Q=Q?Q OFS $i:$i}} else {Q=$2};A[$1]=A[$1]?A[$1] OFS Q:Q;next} ($1 in A){print $1 OFS A[$1];delete A[$1]}' Input_file Input_file
Output will be as follows.
3099753489 3 5
3101954341 5 21 31 34 56 78 14
3102153285 3 5
3102153297 3 5
Thanks,
R. Singh
RudiC
August 19, 2016, 6:38am
6
Try also
awk 'NR == 1 || $1 != LAST {printf "%s%s", NR==1?"":RS, LAST = $1} {printf " %s", $2} END {print _} ' file
3099753489 3 5
3101954341 12 14 16 24
3102153285 3 5
3102153297 3 5
Dear R. Singh
Thanks a lot
And for this case?.
input
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8
output
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8
jiam912:
Dear R. Singh
Thanks a lot
And for this case?.
input
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8
output
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8
Hello jiam912,
Yes, it works for 2 fields too as follows too, I just added 2nd solution in case you have more than 2 fields into your Input_file.
awk 'FNR==NR{if(NF>2){for(i=3;i<=NF;i++){Q=Q?Q OFS $i:$i}} else {Q=$2};A[$1]=A[$1]?A[$1] OFS Q:Q;next} ($1 in A){print $1 OFS A[$1];delete A[$1]} Input_file Input_file
Output will be as follows.
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8
If you have any queries please do let us know in details.
Thanks,
R. Singh
1 Like
Hi jiam912,
Did you try the code RudiC suggested in post #6 in this thread? Or, if his suggestion gave you a syntax error:
awk '
NR == 1 || $1 != LAST {
printf "%s%s", (NR==1?"":RS), LAST = $1
}
{ printf " %s", $2
}
END { print _
}' file
which should work as long as there are only two fields per input line and all lines with the same first field value are adjacent in your input file. If your real input files (like your samples) meet the above requirements, this should be faster than RavinderSingh13's suggestion because it only needs to read your input file once.
1 Like
Hi jiam912,
Did you try the code RudiC suggested in post #6 in this thread? Or, if his suggestion gave you a syntax error:
awk '
NR == 1 || $1 != LAST {
printf "%s%s", (NR==1?"":RS), LAST = $1
}
{ printf " %s", $2
}
END { print _
}' file
which should work as long as there are only two fields per input line and all lines with the same first field value are adjacent in your input file. If your real input files (like your samples) meet the above requirements, this should be faster than RavinderSingh13's suggestion because it only needs to read your input file once.
Hello Don/jiam912,
Not much sure about how much fast this following solution may be, following solution will do:
I- Will read the Input_file once.
II- Will take care of sequence of output as per Input_file only.
III- Will take care of requirement in case more than 2 fields are there for a value of 1st field too.
awk 'FNR==NR{if(NF>2){for(i=3;i<=NF;i++){Q=Q?Q OFS $i:$i}} else {Q=$2};A[$1]=A[$1]?A[$1] OFS Q:Q;if(!C[$1]){D[++j]=$1}} END{for(k=1;k<=j;k++){if(A[D[k]]){print D[k] OFS A[D[k]]};delete A[D[k]]}}' Input_file
Output will be as follows.
3099753489 3 5
3101954341 5 21 31 34 56 78 14
3102153285 3 5
3102153297 3 5
Where Input_file is as follows.
cat Input_file
3099753489 3
3099753489 5
3101954341 12 21 31 34 56 78
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
EDIT: Adding a non-one liner form of solution too now.
awk 'FNR==NR{
if(NF>2){
for(i=3;i<=NF;i++){
Q=Q?Q OFS $i:$i
}
}
else {
Q=$2
};
A[$1]=A[$1]?A[$1] OFS Q:Q;
if(!C[$1]) {
D[++j]=$1
}
}
END {
for(k=1;k<=j;k++) {
if(A[D[k]]){
print D[k] OFS A[D[k]]
};
delete A[D[k]]
}
}
' Input_file
Thanks,
R. Singh
1 Like
RudiC
August 19, 2016, 8:09am
12
In case you have more than 2 fields, try
awk 'NR == 1 || $1 != LAST {printf "%s%s", (NR==1?"":RS), LAST = $1} {sub ("^" LAST, _); printf "%s", $0} END {print _} ' file
1 Like
Try
Input
[akshay@localhost tmp]$ cat f
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8
Output
[akshay@localhost tmp]$ awk '$1 in A{A[$1]=A[$1] OFS $2; next}{ O[++o]=$1; A[$1]=$2}END{for(i=1; i in O; i++)print O,A[O]}' f
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8
Readable version
awk '
$1 in A{
A[$1]=A[$1] OFS $2
next
}
{
O[++o]=$1;
A[$1]=$2
}
END{
for(i=1; i in O; i++)
print O,A[O]
}' f
1 Like
With any shell using POSIX shell syntax, you could also do it without awk
:
#!/bin/ksh
{ read -r last rest
printf '%s %s' "$last" "$rest"
while read -r key rest
do [ "$key" = "$last" ] && printf ' %s' "$rest" ||
printf '\n%s %s' "$key" "$rest"
last="$key"
done
echo
} < input
which also works with two or more fields/line as long as all lines in the input file with a given key are adjacent.
1 Like
ravindersingh13:
Hello Akshay,
Above code will work well when there are 2 fields in Input_file, when there are more than 2 fields then a little change into your code will do the trick as follows.
Let's say Input_file is as follows.
cat Input_file
3099753489 3
3099753489 5
3101954341 12 21 31 34 56 78
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
Then following is the one(edited one from your last post):
awk '{for(i=2;i<=NF;i++){W=W?W OFS $i:$2}}$1 in A{A[$1]=A[$1] OFS W;W=""; next}{ O[++o]=$1; A[$1]=W;W=""}END{for(i=1; i in O; i++)print O,A[O]}' Input_file
Output will be as follows then.
3099753489 3 5
3101954341 12 21 31 34 56 78 14
3102153285 3 5
3102153297 3 5
Thanks,
R. Singh
Hello RavinderSingh13 !
The user has not mentioned anywhere in current thread about more than 2 fields, why do you simply assume more than required and confuse others too ?
Please note records != fields
1 Like
I Apologies to Akshay and all, my intention in these forums is to learn, learn and only learn and try to help, nothing else. I have deleted my post now.
Thanks,
R. Singh
1 Like
Thanks to all.
Appreciate your help.
---------- Post updated at 08:34 AM ---------- Previous update was at 08:33 AM ----------
Thanks to all
Appreciate your help