I am very unhappy at the moment because after trying I only got:
awk -F"/" '$1!~/^http/{print};{print $6}' cases.txt | awk '{ORS=","};{print}'
But this outputs everything separated by comma in the same line which is a mess.
Without the last formatting part, I got this in clear:
awk -F'/' '
!/^http/ && NF {
T = $0
next
}
/^http/ {
A[T] = ( T in A ? A[T] OFS $NF : $NF )
}
END {
for ( k in A )
printf "%s\n%s\n\n", k, A[k]
}
' OFS=, file
You could even use simple variable substitution in a shell script:-
while read line
do
printf "%s\n" "${line##*/}"
done < file
It should fail to substitute anything on the lines without a / so you get the line as is. Those containing a / get everything up to and including the last / removed.
You may find for larger input files that an awk is quicker.
awk '{gsub("\n[^\n]*/",","); #### gsub, it's an awk's in-built keyword which is used for global substitutions, it's format is gsub(/pattern/string which needs to be replaced/,"new pattern or string which will replace old one",line/variable_name). So here we are giving regex like catch pattern \n to till "/" but we are using *(which is greedy character) to so to tell regex that it should stop till (//myurl/bla/blabla"/"1234) this quoted/bold /(slash) we are giving here [^\n] means till it is not equal to \n in simple language.
So it substitutes from titleA\n//myurl/bla/blabla/1234 to titleA,1234 and so on.
sub(",","\n")} #### using sub here(which same as gsub) only difference is it will only do substitution for very first match of regex, so here (output of above gsub will be like) "titleA,1234,6789" so it will change it to titleA\n1234,6789" (where \n is new line in console it will show on line, I am putting this as \n for understanding purposes.
1' #### awk works on basis of condition and action, so by putting 1 we are telling awk to make condition to TRUE and not mentioning any action here so default action will happen that is printing the line.
RS= ORS='\n\n' file #### Mentioning RS(record separator) and ORS(Output record separator) as newline newline means two new lines continuously, so that lines titleA\nhttp://myurl/bla/blabla/1234\nhttp://myurl/bla/blabla/6789 should be considered as a single record and we could do our above mathematics, mentioning Input_file name too then.