Returning two lines if they both match strings

majormajormajor · October 18, 2013, 11:23am

Hi

I have a problem where I have a large amount of files that I need to scan and return a line and its following line, but only when the following line begins with a string.

String one - line one must begin with 'Bill'
String two - line two must begin with 'Jones'.

If these two criteria are matched, it returns the two lines. Repeat for the whole file.

ie. original file:

Edith Blue
Edith Green
Edith Red
Bill Blue
Jones Red
Edith Green
Bill Green
Edith Red
Jones Green
Bill Blue

I'd want it to return only:

Bill Blue
Jones Red

Any ideas? No idea where to begin with this, I only have basic scripting skills with sed/awk etc... At the moment I am using this to get the filename and its following line, but it is giving me too much useless information that I have to strip off with other sed commands.

grep -A 1 "^Bill" * > test.txt

I guess there's a far more elegant way of getting only the lines I need. Any help would be lovely!

CarloM · October 18, 2013, 11:35am

$ awk '{if ((lastword=="Bill") && ($1=="Jones")) {print lastline ORS $0} lastword=$1; lastline=$0}' file
Bill Blue
Jones Red

EDIT: Actually, slightly neater (and conforming more closely to your requirements):

awk '{if ((lastline ~ /^Bill/) && ($0 ~ /^Jones/)) {print lastline ORS $0} lastline=$0}' file

Scrutinizer · October 18, 2013, 11:39am

sed:

sed -n 'N;/^Bill.*\nJones/p;D' file

majormajormajor · October 18, 2013, 12:05pm

beautiful, thankyou.

two quick further questions, if you don't mind

1 - how do i run awk recursively on a directory? i have 100s of files in a directory which i need to run this on.
2 - i also need it to dump the filename at the beginning of each line.

grep was handy with this in that the -A flag dumps the filename with the output. not sure with awk...

thanks a million though!

CarloM · October 18, 2013, 12:14pm

print FILENAME ":\n" lastline ORS $0

If all the files are in the same directory you can just pass multiple filenames to awk. If they're nested under sub-directories (or the filename list is just too long) you can use find (& possibly xargs), e.g.

find /home/someuser/adir -name "something.*" -exec awk '{stuff}' {} \;

or

find /home/someuser/adir -name "something.*" | xargs awk '{stuff}'

majormajormajor · October 18, 2013, 12:26pm

apologies if i'm being a nuisance, but i'm a beginner and not really following.

do you mean find the files and pipe it to awk?
ie. this?

fine -name "*" | awk '{if ((lastword=="Bill") && ($1=="Jones")) {print lastline ORS $0} lastword=$1; lastline=$0}'

doesn't seem to work, i get:

 ;awk: read error (Is a directory);

i think i posted a confused question to start. as there are about 200 files for it to look through to return the matches, i need it to paste the filename at the beginning of each line, so say it finds a match in the filename 'test2.txt', the return i'd like would be:

test2.txt;Bill Blue
test2.txt;Jones Red

again, sorry to be a pest.

CarloM · October 18, 2013, 12:38pm

find . -type f | xargs awk '{if ((lastline ~ /^Bill/) && ($0 ~ /^Jones/)) {print FILENAME ";" lastline ORS FILENAME ";" $0} lastline=$0}'

On my previous solutions I omitted -type f , which restricts find to just regular files.

EDIT: To clarify, what I meant by 'pass multiple filenames to awk' is just to specify them on the command line, e.g. awk 'stuff' * . However, if you have directories under where you're running the command and your files don't have an easily-globbed set of names (like *.txt ) then it's better to use find.

RudiC · October 18, 2013, 12:58pm

Try also

grep -HA1 Bill file*|grep -B1 Jones