Hello Everyone,
I am trying to find a way to take a .csv file with 7 columns and a ton of rows (over 600,000) and remove the entire row if the cell in forth column is blank.
Just to give you a little background on why I am doing this (just in case there is an easier way), I am pulling information from a PCAP into a .csv file and I only want to view the rows from the .csv file if it lists something in the http.host (forth column) entry (i.e. google.com). If that entry is blank because it is not a http.host website then I would like to remove the row. By doing this it would seriously cut down on the amount of rows I have to review to make sure my users are not visiting sites that they should now be.
So far my script looks like this:
#/bin/bash
echo -n "What is the name of your PCAP file? "
read in_pcap
echo -n "What is the name of your CSV file? "
read out_csv
tshark -r "$in_pcap" -T fields -e frame.number -e ip.src -e ip.dst -e http.host -e frame.time -e frame.time_relative -E header=y -E separator=, > "$out_csv"
_____
I ran the script on a current PCAP and it wors like a charm getting the information I need from a pcap file to a csv file unfortunately I am running into the aforementioned blank row situation as every entry does not list a value in the http.host cell. In fact of the over 600,000 I am guessing there are only several hundred rows that I need. So adding to the script above (or creating a new script if need be) to remove rows with a blank entry in the forth column of every row would be the perfect solution however I am not sure how to do that. The condition that needs to be met for the loop (assuming a loop is the solution) for the loop to stop would be for each of the 7 columns to be blank a.k.a. the row after the last of the 600,000+ entries.
Can anyone help me edit my current script and or write a new script to loop over (or otherwise remove) blank entries?
Thanks in advance!