while IFS=. read a b;
do
if [ "$x" != "$a" ]; then
x=$a
printf "\n'%s','%s'" "$x" "$b"
else
printf ",'%s'" "$b"
fi
done < file
This gives following output for given input in post #1 ( one empty line in the beginning !) .
#!/bin/bash
while IFS=. read a b;
do
if [ "$x" == "$a" ]; then
printf ",'%s'" "$b"
else
x=$a
printf "'%s','%s'" "$x" "$b"
fi
done < file | sed -e 's/'"'"''"'"'/'"'"'\n'"'"'/g'
awk -F. -vFMT="'%s'" ' # split line at ".", use FMT variable for printing as single
# quote handling is difficult within the awk script
$1 != LAST {printf QRS FMT, $1 # on changing first field (LAST is the empty string in the first record
# so pattern is TRUE) print it after a temporary record separator (which
# is empty, too, on the 1. record)
QRS = ORS # now set temp sep to line feed for all remaining records
LAST = $1 # keep track of the first field for the upcoming pattern checksy
}
{printf "," FMT, $2 # now keep adding the quoted second fields until first field changes
}
END {printf ORS # final line feed
}
' file
Could you please go through the following and let me know if this helps, also it is not the code in running form, it is only expanded for explanation purposes.
awk #### Starting awk here.
-F"." #### Mentioning -F as .(DOT) here so e could make custom delimiters in awk as per our requirement so making DOT as a delimiter.
-vs1="'" #### defining a variable named s1 whose value is '. So in awk we could define variables by using -v variable_name="value".
'FNR==NR #### mentioning here FNR==NR condition. Where FNR and NR both are awk's in-built keywords and tell us the number of lines in any
Input_file but a major difference between FNR and NR is FNR's value will be RESET each time a new Input_file is being read
and NR's value will keep on increase till the last Input_file being read(as in awk we could mention mutiple files as an Input).
{A[$1]=A[$1]?A[$1] OFS s1 $NF s1:s1 $NF s1; #### So creating here an array named A whose index is $1(first field) and whose value is s1 $NF s1 if that element's index is NOT registered in array A.
#### Where s1 as mentioned before is a variable with value of ' and $NF denoted the last field of a line which is being read. So if a value is find in array A
which is already registered(mentioned in code by ?) then A's that specific index's value will be A[$1] OFS s1 $NF s1, which means appending the current value
into the previous A's value of current index(which is $1 from each line).
next} #### Now using next which is a in-built keyword of awk. So by mentioning this we are skipping all further statements now.
A[$1]{ #### A[$1], so this is a condition which will be executed when 2nd time Input_file is being read and it make sure if any first field is present in array A then execute following statements.
print s1 $1 s1 OFS A[$1]; #### printing here s1 $1 s1 OFS A[$1], where s1 kis a variable mentioned above, $1 is first field of line and OFS is Output field separator(awk's in-built keyword, whose default value is a space) then A[$1], which will print array A's value whose index is $1 of current line.
delete A[$1] #### deleting the array A's element whose index is $1 so that we will not execute already executed $1 as we are reading Input_file twice.
}' OFS=, Input_file Input_file #### Mentioning the value of OFS as a comma(,) which is awk's in-built keyword denotes as Ouptut field separator, mentioning Input_file 2 times too here.