How reverse cut or read rows of lines

Hi,

My records are like this
BSC403_JAIN03|3153_TropicalFarm_LIMJM1-3_97|
BSC403_JAIN03|3410_PantaiAceh_PCEHM1_4_97|
BSC406_BMIN02|1433_JomHebohTV3_COW7M1_11_97|

I want to extract the value before _97|

This command
BSC_ID=`echo $DATA | cut -f5 -d"_"`
gives me
_97|, 4, 11

and by using the command
echo $DATA | awk -F_ '{print $(NF-1)}'
I get LIMJM1-3, 4, 11.

I want to extract 3,4, and 11 only.

please help.

sed 's/.*[-_]\([^-_][^-_]*\)[-_].*/\1/' myFile

try this:

echo $DATA | awk -F[_-] '{print $(NF-1)}'

when i use the BSC_ID=`echo $DATA | awk -F[_-] '{print $(NF-1)}`

i get
BSC403_JAIN03
BSC403_JAIN03|3153_TropicalFarm_LIMJM1-3_97|
BSC403_JAIN03
BSC403_JAIN03|3410_PantaiAceh_PCEHM1_4_97|
BSC406_BMIN02
BSC406_BMIN02|1433_JomHebohTV3_COW7M1_11_97|

which is incorrect

to more precise the number of underscores are not fixed in my file.

BSC403_JAIN03|3153_TropicalFarm_LIMJM1-3_97|
BSC403_JAIN03|3410_PantaiAcehPCEHM1_4_97|
BSC406_BMIN02|1433_JomHebohTV3_COW7M1_11_97|

so that is the reason why I want to read from reverse and get the value before _97|

can you please explain hoe does ths work??? :confused:

Firstly, your command is what what it's been originally suggested by ghostdog74 (which does work for your sample input):

echo $DATA | awk -F[_-] '{print $(NF-1)}'

Secondly, have you tried the 'sed' suggestion yet?

if you take a a closer look at the previous suggestions, you'll see that there's no assumptions of the 'number of underscores/dashes in the file. The only assumption (based on your sample file] is that you want to get the 'next to last' field in the underscoreORdash separated record/line.

Is the above correct description of the objective?

sed 's/.*[-_]\([^-_][^-_]*\)[-_].*/\1/' myFile

from left to right:

.* - any character repeated 0 or more times - greedy - will consume ALL the character leading to the LAST non-underscore/non-dash char followed b dashORunderscore.
[-] - followed by either a '-' or a '' char
\([^-_][^-]*\) - followed by a 'capture' of any character other then '-' or '' repeater 0 or more times.
[-] - followed by either a '-' or a '' char
.* - any character repeated 0 or more times - greedy
\1 - replace the 'matched' string with the FIRST 'capture'

I know it might be a bit confusing reading the regEx expressions at times, but try to think 'pattern matching'....

yes your objective is absolutely correct but how do I use this command

sed 's/.*[-_]\([^-_][^-_]*\)[-_].*/\1/' myFile in my below script.

Myscript

for DATA in `cat $IN_FILE/a.txt`
do
BSC_ID=`echo $DATA | awk -F[_-] '{print $(NF-1)}`
echo $BSC_ID
done

and the output is

BSC403_JAIN03|3153_TropicalFarm_LIMJM1-3_97|
BSC403_JAIN03|3410_PantaiAcehPCEHM1_4_97|
BSC406_BMIN02|1433_JomHebohTV3_COW7M1_11_97|

which is incorrect.

:slight_smile: Thanks a lot for the explaination....
This is really good work.

.... the same way you've used the 'awk' suggestion - although you've missed a single-quote from the original suggestion.

Here's the modified 'awk' way with optimized non-UUOC code - what the purpose of the 'for' loop?:

awk -F[_-] '{print $(NF-1)}' $IN_FILE/a.txt

The same results can be achieved with the similar 'sed' solution no need for the 'for' loop either:

sed 's/.*[-_]\([^-_][^-_]*\)[-_].*/\1/' $IN_FILE/a.txt

cat filename | sed 's/-//' | awk -F"" '{print $5}'

cat and sed and awk...... why?

thanks. Your sed suggestion worked but i still could not get it right with awk. Where did i go wrong?

BSC_ID=`echo $DATA | awk -F[_-] '{print $(NF-1)}'`
Result
BSC403_JAIN03|3153_TropicalFarm_LIMJM1-3_97|
BSC403_JAIN03|3410_PantaiAcehPCEHM1_4_97|
BSC406_BMIN02|1433_JomHebohTV3_COW7M1_11_97|

BSC_ID=`echo $DATA | sed 's/.*[-_]\([^-_][^-_]*\)[-_].*/\1/'`
Result
3
4
11

The reason why I loop is because i execute a lot more other commands in this loop using the extracted value.

dunno, this seems to work just dandy:

echo 'BSC403_JAIN03|3153_TropicalFarm_LIMJM1-3_97|' | awk -F[_-] '{print $(NF-1)}'

don't know - try using 'nawk' instead of 'awk' - see if it works....

Speaking of loops:

nawk -F[_-] '{print $(NF-1)}' $IN_FILE/a.txt | while read myValue
do
   echo "here I do more stuff with the extracted value: [${myValue}]"
done

nawk works