Help required deleting specific lines from file

Hi,

I have a file with 20 columns of data and hundreds of lines of the same format.

Here is an example line. The data repeats underneath with the same format.

  15 1 4 GLY - 1 65 LYSH 23 N - 24 H - 634 O 0.188 157.552 487 48.70

I have been sorting this data by hand but I was wondering if I can use awk, grep, sed or the like to simplify this task.

Some things I want to do to the data:
1) Remove all lines in which column 20 has a value less than 5.00 (This line is a percentage with 2 decimal places, and i would like to remove values with <5%.)

2) Remove all lines in which column 3 IS NOT one of the following values: 3-4, 18-22, 30-38, 42-49, 52-59, 63-65 (ie. 3, 4, 18, 19, 20, 21, 22, 30, 31, 32, 33, 34, 35, 36, 37, 38, etc) (so removing all lines in which column 3 has the value 1, 2, 5 etc)

I realise this is quite a complicated request, but if anyone has any ideas as to how I could achieve this it would be gratefully received.

A few questions in response:-

  • What have you tried so far?
  • Where are you stuck?
  • What OS and version are you using?
  • What tools would you like to work with?

Most importantly,

  • What have you tried so far?

Regards,
Robin

Hi,

I'm running Linux and so using bash shell.

I have to admit I haven't tried anything yet as I am quite new to Linux (I am a new researcher) and I can only do very simple manipulations to files like basic awk commands.

---------- Post updated at 01:32 PM ---------- Previous update was at 01:07 PM ----------

Hi again!

I've managed to find a solution to point 1) using awk.

awk '$20 > 5.00 {print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20}' filename > filename2

Is this the simplest way of achieving point 1?

I still need help with point 2.

Many thanks!
Liv

1 Like

I'm not great on awk, but I think you could shorten it to:-

awk '$20 > 5.00 {print $0}' infile > outfile

Robin