Cvs manipule.

aka0017 · September 25, 2015, 3:25am

Hello all,

What i need to do is manipulate a squid log file.
I have many milions of file in this format:

1442814667.478     76 4.3.2.1 TCP_MISS/200 31845 GET http://pippo.com/inde.html - DIRECT/1.2.3.4 text/css

What i need to do is transform field 7 from Pippo.com - 404 File Not Found in http://pippo.com and write the new file like:

1442814667.478,76,4.3.2.1,TCP_MISS/200,31845,GET, http://pippo.com,-, DIRECT/1.2.3.4,text/css

I can convert file in cvs with awk but have no idea how to modify the field 7.
Can you please help me?
Thanks in advance a.k.a

---------- Post updated at 02:25 AM ---------- Previous update was at 02:13 AM ----------

what i have try is a silly shell code but it's damn slow

[root@site a]# cat log.log |while read a b c d e f g h i l;
do
PIPPO=`echo $g |cut -f1 -d"/"`;
PLUTO=`echo $a |cut -f1 -d"."`;
echo "$PLUTO,$b,$c,$d,$e,$f,$PIPPO,$h,$i,$l" >>sasha.csv
done
[root@site a]# -

Don_Cragun · September 25, 2015, 3:38am

I don't understand.

Why is your output field separator sometimes a comma and sometimes a space and a comma? Why aren't all of the spaces removed from the output?

There is no "Pippo.com - 404 File Not Found" in your input file (in field 7 nor anywhere else). Are you just saying that you want to remove the last slash character ( / ) and everything that following from field 7?

Furthermore, your sample code removes the period and everything following it from the 1st field; but your desired output shows no change at all to field 1 AND your description of your problem says nothing about changing field 1.

Please be more clear in your explanation of what you are trying to do.

Don_Cragun · September 25, 2015, 5:52am

Making some wild guesses, here are a couple of ways to do what you seem to want: one using an awk script and one just using shell built-ins (assuming that you are using a shell that performs basic POSIX shell parameter expansions):

#!/bin/ksh
awk -v OFS=, '
{	sub("[.][^.]*$", "", $1)
	sub("/[^/]*$", "", $7)
	$1=$1
}
1' log.log >>sasha.csv

while read a b c d e f g h i j
do	printf '%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n' "${a%.*}" "$b" "$c" "$d" "$e" \
	    "$f" "${g%/*}" "$h" "$i" "$j"
done < log.log >> sasha.csv

As always, if you want to use awk on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

With your sample input, both of these produce the output:

1442814667,76,4.3.2.1,TCP_MISS/200,31845,GET,http://pippo.com,-,DIRECT/1.2.3.4,text/css

aka0017 · September 25, 2015, 6:02am

Sorry, will try to explain it better.

what i have is a common squid log forrmat.
what i need is to load in a database this log file.

What is my flow:

convert log file separated by space in a CSV file.
trim the destination url field cause i'm not intersted in full url but only in destination, so http://www.blablabla.com/bla.html have to be transformed in
www.blablabla.com.
(btw, sorry i forget to uncheck "Automatically parse links in text")

What i say is that for me is simple create a CSV file, and with the shell code trim the file but shell code is very very slow. What i need is understand how to make a sed/awk script that will "clean" only the filed 7 (url)

Thank in advance and sorry for the poor and confudes information in first post.

Don_Cragun · September 25, 2015, 7:56pm

Did you look at post #3 in this thread? If shows you how to use awk and how to use a shell script to trim fields 1 & 7 in ways that should be considerably faster than your current shell script. It should be easy for you to remove a sub() statement from the awk script or remove a parameter expansion from the shell script if you don't want to modify field 1.