Get extract and replace column with link in a column where it exists

zozoo · October 26, 2017, 10:29am

hi i have sample data

a,b,c,d,e,g h http://mysite.xyx
z,b,d,f,e,s t http://123124#
a,b,c,i,m,nothing
d,i,j,e,w,nothing

output expected is

a,b,c,d,e,http://mysite.xyx
z,b,d,f,e,http://123124#
a,b,c,i,m,nothing
d,i,j,e,w,nothing

i can get only links using grep -o 'http.*'

i tried something like below it doesn't work

for i in `cat file.csv`
do
first=$i|awk '{print $1}'
second=$i|awk '{print $2}'
third=$i|awk '{print $3}'
four=$i|awk '{print $4}'
five=$i|awk '{print $5}'
six=$i|awk '{print $6}'
 if [ $six = "nothing" ] 
 then
 six=$six
 else
    six=`grep -o 'http.*' $six`
 fi
echo "$first,$second,$third,$four,$five,$six"
 done >> output.csv

RavinderSingh13 · October 26, 2017, 10:34am

Hello zozoo,

There are lot of questions in output shown by you.
I- By what logic you have removed lines a,b,c,i,m,nothing and d,i,j,e,w,nothing ?
II- By which logic line z,b,d,f,e,s t http://123124# changed to z,b,d,f,e,http://mysite.xyx ?

Would like to request you to please be clear in your posts.

Thanks,
R. Singh

zozoo · October 26, 2017, 10:41am

hi Ravinder i am sorry for the wrong output corrected now
so basically i want to check if the sixth column is having any url then replace the field with url else leave it what ever value it is having .

RavinderSingh13 · October 26, 2017, 10:57am

Hello zozoo,

Could you please try following and let me know if this helps you.

awk -F',| ' '{print $1,$2,$3,$4,$5,$NF}' OFS=,   Input_file

Thanks,
R. Singh

Scrutinizer · October 26, 2017, 11:08am

Another way:

awk 'NF>1{sub(/[^,]*$/,$NF)}1' file

or

sed 's/[^,]* //' file

--
(same thing with awk: )

awk '{sub(/[^,]* /,x)}1' file

zozoo · October 26, 2017, 11:20am

That solved i was trying to another version just now to match the http string and then split by space into array to retrun the value , but your solution solved it

so in the solution you are trying to split by delimeter , or <space> right and $NF would give last field am i correct in understanding the solution

---------- Post updated at 08:50 PM ---------- Previous update was at 08:41 PM ----------

Hi Scrutinizer the solution you provided also works its difficult to understand can you please explain

Scrutinizer · October 26, 2017, 11:45am

Hi zozoo,

The first approach replaces the part after the last comma with the last field ( $NF )
The other ones remove the part after the last comma upto and including the last space

[^,] means "a character that is not a comma".

RavinderSingh13 · October 26, 2017, 11:53am

Hello zozoo,

Could you please go through following explanation and let me know if this helps you.

awk -F',| ' '{           ##making field separator as comma(,) OR space here for each line.
print $1,$2,$3,$4,$5,$NF ##Printing the first, second, third, fourth, fifth, and $NF(last field) od the line.
}
' OFS=,   Input_file     ##Setting Output filed separator as comma and mentioning Input_file here too.

Thanks,
R. Singh

Aia · October 26, 2017, 11:57am

perl -pe 's/\w+ //g/' zoo.file #removes any word followed by space