Regular expression with sed

nervous · October 21, 2008, 9:16am

Hi,

I'm trying following:

echo "test line XA24433 test" | sed 's/.*X\(.*[^ ]\)/X\1/'
XA24433 test

While I want the output as: XA24433

I want to grab the words starting with letter X till the next space, this word can be anywhere in the line.

palsevlohit_123 · October 21, 2008, 9:45am

echo "test line XA24433 test" | sed 's/.*X\(.*[^ ]\)/X\1/'|awk '{print $1}'

radoulov · October 21, 2008, 9:58am

% echo "test line XA24433 test" | sed 's/.*\(X[^ ]*\).*/\1/'
XA24433

Or (if you have more than one X on the line and you want the first one):

sed 's/[^X]*\(X[^ ]*\).*/\1/'

With zsh:

% s="test line XA24433 test"
% print ${(M)${(z)s}:#X*}
XA24433

nervous · October 21, 2008, 10:14am

Thanks for your answers, last solution is perfect, I need some more assistance, file format is as follows:

ADMDN13EX84447619,"HUMMER H3 SUV, LEATHER SEAT",XAWBG020209,m,Kuwaiti,M,13/05/1969,39,Block Building

I need XAWBG020209 from this line, I can't use awk because sometimes XA string is in the third column, my part numbers are starting with XA or XG, I want to take these part numbers then lookup in another file acc.csv which contains the description of these parts, I want to match part number extracted from this line with that part number and show the description, I wanted to do it by myself but since 3 hours I'm not able to figure out anything.

Second file is properly formatted, few lines from that file are as follows:

XG96470024,04-05OP WHEEL (ALLOY)
XG96545706,05OP SPOILER A-RR# HB
XG96635210,EPICA SPOILER KIT-V250
XG96635230,EPICA BODY KIT-V250
XG96654234,"AVEO TIP A-EXHAUST,TAIL - T -2"
XG96664463,07EP BLUETOOTH
XG96806104,07EP REAR SPOILER
XG96806220,07EP BODY KIT
XG96816783,07EP EXHAUST TIP
XGCHCAP71,CHROME PACKAGE CAPRICE 08

Your help would be appreciated.

wempy · October 21, 2008, 10:27am

ok, so lets build a regex then:

so long as the part number isn't going to be in the first column and it is always going to be followed by a comma then it is fairly easy (when you know how <grin>)

/.*,X[AG][a-zA-Z0-9]*,.*/

should match want you want, and to pull out the part number part just surround it with () and use \1 as the replace parameter:

/.*,$X[AG][a-zA-Z0-9]*$,.*/

echo 'ADMDN13EX84447619,"HUMMER H3 SUV, LEATHER SEAT", \
XAWBG020209,m,Kuwaiti,M,13/05/1969,39,Block Building' \
|sed 's/.*,\(X[AG][a-zA-Z0-9]*\),.*/\1/'

radoulov · October 21, 2008, 10:50am

awk -F, 'NR == FNR {
  for (i=1; i<=NF; i++)
    if ($i ~ /^X[AG]/) {
      _[$i]
      break
    }
  }
$1 in _' first_file acc.csv

With GNU grep and process substitution (if available) you may write something like this:

grep -f  <(grep -o 'X[AG][^,]*' first_file) acc.csv

jim_mcnamara · October 21, 2008, 11:33am

Process substitution is available in ksh, zsh, and bash on OSes that have /dev/fd

summer_cherry · October 21, 2008, 9:55pm

just some slight modification on your code:

echo "test line XA24433 test" | sed 's/.*X\(.*[^ ]\) .*$/X\1/'

nervous · October 22, 2008, 1:16am

awk -F, 'NR == FNR {
  for (i=1; i<=NF; i++)
    if ($i ~ /^X[AG]/) {
      _[$i]
      break
    }
  }
$1 in _' first_file acc.csv

I think this assumes that there will be only part number in the field, but part number can be anywhere in the field like "Test XA43223 test" and I want to return "Not Found" if awk doesn't find any string starting with XA or XG.

It'll be helpful to figure me out it by myself if someone could explain how this code works.

Thanks a lot.

radoulov · October 22, 2008, 3:27am

nervous:

awk -F, 'NR == FNR {
  for (i=1; i<=NF; i++)
   if ($i ~ /^X[AG]/) {
   _[$i]
   break
   }
  }
$1 in _' first_file acc.csv
I think this assumes that there will be only part number in the field, but part number can be anywhere in the field like "Test XA43223 test" and I want to return "Not Found" if awk doesn't find any string starting with XA or XG.

It'll be helpful to figure me out it by myself if someone could explain how this code works.

Thanks a lot.

OK,
your assumption is correct. Assuming one pattern per field and one pattern per line you may try something like this:

awk -F, 'NR == FNR {
  for (i=1; i<=NF; i++)
    if ($i ~ /X[AG]/) {
      sub(/[^X]*/, "", $i)
      sub(/ .*$/, "", $i)
      _[$i]
      break
    }
  }
$1 in _' first_file acc.csv

With GNU AWK or Perl you'll need less code :).