Need some help deleting words from a line which are not my "Keyword"

linuxkid · July 18, 2010, 11:01am

Hi, i'm currently new to scripting and need some help with my problem, so i'll jump right to it.

I have a file containing text, the file is pretty big so for the sake of this i'll just say this is the text:

John id number is abc34938
Grahams id number is pending
id number abc64334 is Bob's 
abc32432 is the id number for Jim 
Mike's ID number is pending
Michael id number is abc4352

What i would like to do is search the lines for the keyword "abc", and the lines which contain the keyword, I would like to delete the other words in the line. Thus leaving an output as such:

abc34938
Grahams id number is pending
abc64334
abc32432
Mike's ID number is pending
abc4352

I've searched far and wide for a solution and came stuck, so im really hoping one of you guys could help me out.

Thank you for your time

radoulov · July 18, 2010, 11:31am

perl -nle'
  /abc/ and print /abc\w+/g 
    or print  
  ' infile

linuxkid · July 18, 2010, 12:25pm

Thanks for the reply mate, as i'm new i'm going to risk angering the gods by asking if that would work even though im doing bash scripting and not perl? (I would try it now but don't have the facilities to do so just now)

I suppose if you say yes, then I should ask if I need to include anything in my bash script to enable it? As currently I have #!bin/bash at the top, would I need to do anything else to this?

guruprasadpr · July 18, 2010, 12:39pm

Hi

sed  '/abc/s/.*\(abc[0-9][0-9]*\).*/\1/' file

Guru.

linuxkid · July 18, 2010, 12:56pm

Thank you for the reply Guru, just a quick addition to the question if I may, what if there were two words I wanted to keep on a line? would I just use the '|' symbol to specify 'or'?

radoulov · July 18, 2010, 5:48pm

You just need to have the Perl interpreter in your PATH (ant it's already there on most systems).
If you have more that one abcxxx on the same line, you'll have all of them printed as one word,
if you them separated (by a single space, for instance), you should change the code like this:

perl -nle'
  /abc/ and print join " ", /abc\w+/g 
    or print
  ' infile

rdcwayx · July 18, 2010, 8:13pm

awk '{for (i=1;i<=NF;i++) {if ($i~/^abc/) $0=$i}}1' urfile

kurumi · July 18, 2010, 9:10pm

sed '/abc/s/\(.*[ \t]*\)\(abc.[^ \t]*\)\(.*\)/\2/' file

linuxkid · July 19, 2010, 5:45am

Thanks for the reply Radoulov (and others). As I could not access my documents yesterday I have discovered a slight problem, and although the solution partially works it doesn't do as required, which is partly my fault. The ID number is actually of the format:

abcd-p-ssa-a322f-s-4312

with the hyphens, numbers, and letters in that order. Currently the solution will only print upto the abc and not everything else. However, to add to my misery, there are also other types of ID of the format:

abcf-p-ssa-a322f-s-4312 (the abc here still remains)
def-abcf-p-ssa-a322f-s-4312 (addition of def- at the start of the two above)
def-abcd-p-ssa-a322f-s-4312

The first few sets of lettters always remain the same (i.e. abc-p-ssa , abcf-p-ssa , def-abcf-p-ssa , def-abcd-p-ssa) . Sorry for the inconvienence caused (I probably spelt that wrong!). Thank you for your time.

---------- Post updated at 10:45 AM ---------- Previous update was at 09:29 AM ----------

Hey, I tried this out but it cuts out the last line of my file for some reason. Also if you could, my requirements have now changed slightly they are written above. Thanks

radoulov · July 19, 2010, 5:48am

perl -nle'
   /abc/ and print join " ", /abc[\w-]+/g
     or print
    ' infile

linuxkid · July 19, 2010, 6:05am

Thats world class mate! But if I also wanted to add the option to search for lines with the def instead of abc in addition to the abc, how would I do that? Is there some sort of OR operator?

radoulov · July 19, 2010, 6:08am

perl -nle'
  /(?:abc|def)/ and print join " ", /(?:abc|def)[\w-]+/g      
    or print     
    ' infile

linuxkid · July 19, 2010, 6:27am

Brilliant! Thanks mate, really appreciate the help!

linuxkid · August 19, 2010, 3:26am

sorry to bring this up again, but since this answer theres been a few changes in my input file,mainly the addition of a new set of numbers

abcf-p-ssa-a322f-s-4312 
def-abcf-p-ssa-a322f-s-4312 
def-abcd-p-ssa-a322f-s-4312
#below is the new addition
ghi/123/xxx/sss/xa2
ghi/3d/cksdi/kff/23 def-abcd-p-ssa-a322f-s-4312

the ghi always remains at the start, so the problem is that the code you gave doesn't work because of the brackets in this new set, as I tried:

perl -nle'
  /(?:abc|def)/ and print join " ", /(?:abc|def)[\w-]+/g      
    or print     
    ' infile

but that just ignored anything with brackets! I also removed the '+' sign after [\w-] but that gave me the ghi, but not the rest of the number! Also as in teh last line, some of these may be on the same line, but your code already seems to work for that anyway (excpet when its got '/' in the number, so I need to keep that functionality)

Many thanks

radoulov · August 23, 2010, 5:31am

If you still need help, please post sample data and an example of the desired output.