Hi All
I wanted to know how to effectively delete some columns in a large tab delimited file.
I have a file that contains 5 columns and almost 100,000 rows
3456 f g t t
3456 g h
456 f h
4567 f g h z
345 f g
567 h j k l
This is a very large data file and tab delimited.
I need to extract the rows that have values in all the 5 columns. At present, there are several rows that contain only 3 values.
please let me know the best way to extract the rows with all 5 values
Thanks.
LA
Hi All
I wanted to know how to effectively delete some columns in a large tab delimited file.
I have a file that contains 5 columns and almost 100,000 rows
3456 f g t t
3456 g h
456 f h
4567 f g h z
345 f g
567 h j k l
This is a very large data file and tab delimited.
I need to extract the rows that have values in all the 5 columns. At present, there are several rows that contain only 3 values.
please let me know the best way to extract the rows with all 5 values
Thanks.
LA
awk 'NF==5' file > newfile
If Franklin52's solution does not work, perhaps it is because there are always five fields (four tabs delimiting possibly empty fields). In that case, you could try:
awk -F'\t' '{for (i=1;i<=NF;i++) if (!length($i)) next; print}' file
Regards,
Alister