Replacing or removing a long list of pattern by using awk or sed

Input:
>abc|123456|def|EXIT|
>abc|203456|def|EXIT2|
>abc|234056|def|EXIT3|
>abc|340056|def|EXIT4|
>abc|456000|def|EXIT5|
.
.
.

Output:
def|EXIT|
def|EXIT2|
def|EXIT3|
def|EXIT4|
def|EXIT5|
.
.

My try code:

sed 's/>abc\|(\d+)\|//' input_file

Unfortunately, it can't work :frowning:
Does anybody got any better idea?

Try this one:

sed 's/.*\(def.*\)/\1/' file

Python, if you have

#!/usr/bin/env python
for line in open("file"):
    print '|'.join(line.split("|")[2:]).strip()

output

# python script.py
def|EXIT|
def|EXIT2|
def|EXIT3|
def|EXIT4|
def|EXIT5|

---------- Post updated at 08:16 AM ---------- Previous update was at 08:16 AM ----------

Python, if you have

#!/usr/bin/env python
for line in open("file"):
    print '|'.join(line.split("|")[2:]).strip()

output

# python script.py
def|EXIT|
def|EXIT2|
def|EXIT3|
def|EXIT4|
def|EXIT5|

nawk -F/abc\|[0-9][0-9][0-9][0-9][0-9][0-9]\|/ ' {print $2} ' infile

Thanks a lot, Franklin52.
Your code is worked perfectly if it is only "def" after the ">abc|number|"
If my input is something like this:
>abc|123456|def|EXIT|
>abc|203456|def|EXIT2|
>abc|234056|def|EXIT3|
>abc|340056|acg|EXIT4|
>abc|456000|hta|EXIT5|

And my output is:
def|EXIT|
def|EXIT2|
def|EXIT3|
acg|EXIT4|
hta|EXIT5|

can I write the code like this:

cat file | sed 's/.*\(def.*\)/\1/' | sed 's/.*\(acg.*\)/\1/' | sed 's/.*\(hta.*\)/\1/'

Or you got any better suggestion?

If the file you are getting is a fixed format , then you can try

cat file | cut -f 3- -d '|'

what's the pattern of the file? do you want to retain whatever comes after the numbers? will there always be numbers?

assuming there are always numbers in the output and you want to retain everything after that:

with sed:

mo@mo-laptop:~/scripts$ echo "abc|123456|def|EXIT|" | sed 's/^.*[0-9].//g'
def|EXIT|
mo@mo-laptop:~/scripts$ 

with awk:

mo@mo-laptop:~/scripts$ echo "abc|123456|def|EXIT|" | awk 'BEGIN { FS="|" } { OFS="|" }{ print $3, $4}'
def|EXIT
mo@mo-laptop:~/scripts$

If the field data varies, try this:

sed 's/>\([A-Za-z]*\)|\([0-9]*\)|\(.*\)/\3/' filename
$  awk  -F"|" '{print $3"|"$4"|"}' urfile
def|EXIT|
def|EXIT2|
def|EXIT3|
def|EXIT4|
def|EXIT5|

thanks a lot, your code is worked ^^

---------- Post updated at 02:45 AM ---------- Previous update was at 02:43 AM ----------

Hi, you are right. your code is worked perfectly as well :slight_smile:
Thanks for your suggestion. So that I got more variety way to archive same goal ^^