Help required deleting specific lines from file

livbaddeley · February 7, 2014, 7:58am

Hi,

I have a file with 20 columns of data and hundreds of lines of the same format.

Here is an example line. The data repeats underneath with the same format.

  15 1 4 GLY - 1 65 LYSH 23 N - 24 H - 634 O 0.188 157.552 487 48.70

I have been sorting this data by hand but I was wondering if I can use awk, grep, sed or the like to simplify this task.

Some things I want to do to the data:
1) Remove all lines in which column 20 has a value less than 5.00 (This line is a percentage with 2 decimal places, and i would like to remove values with <5%.)

2) Remove all lines in which column 3 IS NOT one of the following values: 3-4, 18-22, 30-38, 42-49, 52-59, 63-65 (ie. 3, 4, 18, 19, 20, 21, 22, 30, 31, 32, 33, 34, 35, 36, 37, 38, etc) (so removing all lines in which column 3 has the value 1, 2, 5 etc)

I realise this is quite a complicated request, but if anyone has any ideas as to how I could achieve this it would be gratefully received.

rbatte1 · February 7, 2014, 8:05am

A few questions in response:-

What have you tried so far?
Where are you stuck?
What OS and version are you using?
What tools would you like to work with?

Most importantly,

What have you tried so far?

Regards,
Robin

livbaddeley · February 7, 2014, 8:32am

Hi,

I'm running Linux and so using bash shell.

I have to admit I haven't tried anything yet as I am quite new to Linux (I am a new researcher) and I can only do very simple manipulations to files like basic awk commands.

---------- Post updated at 01:32 PM ---------- Previous update was at 01:07 PM ----------

Hi again!

I've managed to find a solution to point 1) using awk.

awk '$20 > 5.00 {print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20}' filename > filename2

Is this the simplest way of achieving point 1?

I still need help with point 2.

Many thanks!
Liv

rbatte1 · February 7, 2014, 9:51am

I'm not great on awk, but I think you could shorten it to:-

awk '$20 > 5.00 {print $0}' infile > outfile

Robin