Put in one line only the lines with digits

Kibou · September 6, 2016, 11:16am

Hello,
I haven't been here for a while and I might be forgetting some things.
I have an input text like this

titleA
http://myurl/bla/blabla/1234
http://myurl/bla/blabla/6789

titleB
http://myurl/bla/blabla/5678
http://myurl/bla/blabla/1234

titleC
http://myurl/bla/blabla/9123
http://myurl/bla/blabla/1234
http://myurl/bla/blabla/8912

I need to extract only the title and then the digits at the end only, separated by comma.

Desired output:

titleA
1234,6789

titleB
5678,1234

titleC
9123,1234,8912

I am very unhappy at the moment because after trying I only got:

awk -F"/" '$1!~/^http/{print};{print $6}' cases.txt | awk '{ORS=","};{print}'
But this outputs everything separated by comma in the same line which is a mess.

Without the last formatting part, I got this in clear:

titleA
1234
6789

titleB
5678
1234

titleC
9123
1234
8912

But also full of many blank lines in between.

I really don't know how to move forward. Any hint will be appreciated. Thank you in advance.

I am working in an Ubuntu Linux right now.

RavinderSingh13 · September 6, 2016, 11:44am

Hello Kibou,

Could you please try following.

awk '($0 ~ /title/){if(Q){print Q ORS $0;Q=""} else {print};next} {sub(/.*\//,X,$0);Q=Q?Q ($0?OFS $0:RS):$0;} END{print Q}' OFS=,   Input_file

Output will be as follows.

titleA
1234,6789
 
titleB
5678,1234

titleC
9123,1234,8912

EDIT: Adding above solution's little polish version.

awk '($0 ~ /title/){W=Q?Q ORS $0:$0;print W;W=Q="";next} {sub(/.*\//,X,$0);Q=Q?Q ($0?OFS $0:RS):$0} END{print Q}' OFS=,   Input_file

Thanks,
R. Singh

Yoda · September 6, 2016, 12:39pm

awk -F'/' '
        !/^http/ && NF {
                T = $0
                next
        }
        /^http/ {
                A[T] = ( T in A ? A[T] OFS $NF : $NF )
        }
        END {
                for ( k in A )
                        printf "%s\n%s\n\n", k, A[k]
        }
' OFS=, file

Scrutinizer · September 6, 2016, 1:04pm

Another approach:

awk '{for(i=2; i<=NF; i++) sub(".*/",x,$i); sub(OFS,FS)}1' FS='\n' OFS=, RS= ORS='\n\n' file

--edit--
Yet another option:

awk '{gsub("\n[^\n]*/",","); sub(",","\n")}1' RS= ORS='\n\n' file

Kibou · September 6, 2016, 3:23pm

I really appreciate it. I am always speechless with your amazing approaches.

RudiC · September 6, 2016, 5:14pm

Try also

awk '{printf "%s%s", (NF>1 && ONF>1)?",":LF, $NF; LF = RS; ONF = NF} END {printf RS}' FS=/ file
titleA
1234,6789

titleB
5678,1234

titleC
9123,1234,8912

It would work also if the empty lines were missing.

itkamaraj · September 7, 2016, 3:20am

$ awk -F\/ '/^title/{print a;print;a="";}/^http/{a=a","$NF}END{print a}' a.txt | sed '1d;s/^,//'
titleA
1234,6789
titleB
5678,1234
titleC
9123,1234,8912

rbatte1 · September 7, 2016, 5:47am

You could even use simple variable substitution in a shell script:-

while read line
do
   printf "%s\n" "${line##*/}"
done < file

It should fail to substitute anything on the lines without a / so you get the line as is. Those containing a / get everything up to and including the last / removed.

You may find for larger input files that an awk is quicker.

Robin

andy391791 · September 7, 2016, 11:32am

Hi, regarding code :

awk '{gsub("\n[^\n]*/",","); sub(",","\n")}1' RS= ORS='\n\n' file

Would you be able to explain how this works as i cant understand it?

Many thanks

RavinderSingh13 · September 7, 2016, 11:50am

Hello andy391791,

Following may help you in same.

awk '{gsub("\n[^\n]*/",",");       #### gsub, it's an awk's in-built keyword which is used for global substitutions, it's format is gsub(/pattern/string which needs to be replaced/,"new pattern or string which will replace old one",line/variable_name). So here we are giving regex like catch pattern \n to till "/" but we are using *(which is greedy character) to so to tell regex that it should stop till (//myurl/bla/blabla"/"1234) this quoted/bold /(slash) we are giving here [^\n] means till it is not equal to \n in simple language. 
                                        So it substitutes from titleA\n//myurl/bla/blabla/1234 to titleA,1234 and so on.
sub(",","\n")}                     #### using sub here(which same as gsub) only difference is it will only do substitution for very first match of regex, so here (output of above gsub will be like) "titleA,1234,6789" so it will change it to titleA\n1234,6789" (where \n is new line in console it will show on line, I am putting this as \n for understanding purposes.
1'                                 #### awk works on basis of condition and action, so by putting 1 we are telling awk to make condition to TRUE and not mentioning any action here so default action will happen that is printing the line.
RS= ORS='\n\n' file                #### Mentioning RS(record separator) and ORS(Output record separator) as newline newline means two new lines continuously, so that lines titleA\nhttp://myurl/bla/blabla/1234\nhttp://myurl/bla/blabla/6789 should be considered as a single record and we could do our above mathematics, mentioning Input_file name too then.

Thanks,
R. Singh

Scrutinizer · September 8, 2016, 9:58am

@ Ravinder, thanks for the explanation. In the last sentence: "considered as a single field " => "considered as a single record "