I am a newbie to Unix and slowly learning it. I have a large data set with 8 different columns. I want to compare two columns and retrieve data if the two columns have similar number.
I have attached the example. There are two columns (S-Contig and N-Contig). I want to retrieve the data from rows where these two columns have a same number. For example, both these columns have 3205 and I want to write the data from all the columns for 3205 into a new file.
I have been trying to figure out this question for couple of hours now using awk and I am confused. I really appreciate if some one can help me with this.
Thank you for your reply. I converted the file to unix newlines and tried awk command. I am having trouble in getting the output.
$ file problem.txt
problem.txt: ASCII text, with CR line terminators
$ sed -e 's/\r$//' problem.txt > problem2.txt
$ file problem2.txt
problem2.txt: ASCII text, with CR, LF line terminators
$awk '{ a[$1] = $1 } $5 == a[$5]' problem2.txt > ans.txt
# This gives me an empty ans.txt file. I would like to write all the data for these matching columns as an output.
Thank you for being so helpful. I tried this code and it didn't work.
awk '$1 == $5' problem.txt
I am not sure if I understand your question. All I am trying to do is to write a new file with data from matching columns 1 and 5.
My input file has two sets of data. One set is from column 1 to 4 and another set is from column 5 to 8. Column 1 and 5 have similar numbers. I want to match those similarities and get all the other data from these files.
Hope my explanation is clear. Please let me know if you understand my question.