how to extract the data ?

Hi,

I'm trying to pick out a data field eg. from below. I need the required field as below but they are filled sometimes with weird chars like \-(. or watever. How can I accurately extract the 3rd field in shell? :confused:

ID IDNO - REQUIRED FIELD

ID 1447 - MAT620BR.
ID 1452 - FGI-LOM3100R \ LOM FGI REPORT (.
ID 1453 - FGI-LOM3101R \ LOM FGI REPORT (.
ID 1512 - SAM05TRR.
ID 1514 - SAM6220R.
ID 1515 - SAM07R.
ID 1516 - SAM07R00.
ID 1517 - SAM10R.
ID 1518 - SAM10R00.
ID 1521 - SAM13R.
ID 1536 - MONJ001R.
ID 1537 - MONJ004R.
ID 1541 - FROLPS.
ID 1542 - FROAPD.
ID 1548 - MOS5610R.
ID 1550 - C009LP \ DAILY INVOICE.
ID 1554 - SAM49R.
ID 1559 - MAT310AR.

You have various ways to extract lines from a text.

You could use head/tail

head -n 3 filename| tail -n 1

You can use sed

sed -n '3p' filename

You can use awk

awk 'NR == 3 {print}' filename

Etc...

To remove those chars, you can use tr

 redoubtable@Tsunami ~ $ awk 'NR == 3 {print}' filename|tr '.(\\' '\000'
ID 1453 - FGI-LOM3101R  LOM FGI REPORT 
redoubtable@Tsunami ~ $ 

awk -F"-" '{print $NF}' file

Can you show us what have you tried so far and where you are stuck?

Regards

Hi Ghost,

I was checking the recommendation you gave. But there was a problem.The output with your recommendation gave:

LOM3100R \ LOM FGI REPORT (.

The correct output should be:

FGI-LOM3100R \ LOM FGI REPORT (

without the fullstop but includes the FGI-

ID 1452 - FGI-LOM3100R \ LOM FGI REPORT (.

ID 1453 - FGI-LOM3101R \ LOM FGI REPORT (.

-----Post Update-----

Hi Franklin,

i've tried basic awk and cut to print 3rd field but they do not work. i used spaces and - as delimiters but they do not give the full output i require

awk -F" - " '{print $NF}' file

Are there better ways of making the extraction more accurate? it works with the " - " now, however, there could still be mistakes if a field would contain something like below.

FGI-LOM3100R - LOM FGI REPORT (.

Try this:

awk -F" - " '{gsub("[(\.]","")}{print $2}' 

You can place "weird" characters within the brackets [], special characters must be escaped with a backslash.

Hi Franklin,

the expected output should be the whole field as what it is.

eg. FGI-LOM3100R \ LOM FGI REPORT (.

with your codes, the result is something like
FGI-LOM3100R LOM FGI REPORT
which is inaccurate.

again how can we extract the 3rd field totally without being affected by the chars within the field as delimiters?

i've tried using awk " - " but its giving me inaccurate answers if the field has a " - " within.

eg,

data: ID 123 - testing
output using awk comamnd: testing
desired output: testing

data: ID 456 - abc-abc.(
output using awk comamnd: abc-abc.(
desired output: abc

data: ID 7111 - abc - def
output using awk comamnd: def
desired output: abc - def

I've misread the question, try this:

awk -F" - " '{print $2}' file

Regards

awk '{$1="";$2="";$3=""; print }' input_file.txt | sed 's/^[ ]*//g'

Hi Guys,

Thanks for the replies so far. Haven't really quite hit the nail on the head yet though

data: ID 7111 - abc - def
output using awk -F" - " '{print $2}' file comamnd: abc
desired output: abc - def

its simple...

cat inputfile | cut -d"-" -f2-

fantastic. Thanks!