Substring using cut/awk/sed

Hi Gurus,I have a seemingly simple problem but struggling with it.It is as follows :
I/p string -

ABCDEFGHIJ20100909.txt

desired o/p -

AB,DEF,20100909,ABCDEFGHIJ20100909.txt

How to achieve it ?Thanks in advance.

Try:

$ echo ABCDEFGHIJ20100909.txt|sed 's/\(..\).\(...\)....\([^.]*\).*/\1,\2,\3,&/'
AB,DEF,20100909,ABCDEFGHIJ20100909.txt

Another way, but more specific for your case, I mean with letters and numbers:

echo "ABCDEFGHIJ20100909.txt" | sed -ne 's/\([A-Z]\{2\}\).\([A-Z]\{3\}\)[A-Z]\{4\}\([0-9]\{8\}\)\.txt$/\1,\2,\3,\0/p'

Thanks a ton !

Scrutizer, can you explain how it works ?

Also , I used the following -

echo ABCDEFGHIJ20100909.txt|awk '{ print substr($0,1,2)","substr($0,3,3)","$0 }'

o/p -

AB,CDE,ABCDEFGHIJ20100909.txt

unable to get the date, I dnt want to depend on teh location of the date , I can use the code given by felipe. but I want to know.. is it possible using awk ?

Thanks !

Does it helps you?

echo ABCDEFGHIJ20100909.txt|awk '{ print substr($0,1,2)","substr($0,4,3)"," substr($0,11,8) "," $0}'

Yes,

but since in this case I know where the date field starts , I want to make it indeppendant of teh date location.. is there anyway by which I can check the occurance of date & print it ?

$ echo ABCDEFGHIJXXXXXXXXXXXXX20100909XXXXXXXX.txt|sed 's/\(..\).\(...\).*\([0-9]\{8\}\).*/\1,\2,\3,&/'
AB,DEF,20100909,ABCDEFGHIJXXXXXXXXXXXXX20100909XXXXXXXX.txt

Hi ,Thanks again... I have never used sed this way before , can u explain me the use of "...\" , ".*" & "n\" ?or direct me to a some page which will explain this usage ?

3 usefull links about sed:

1 Like

Hi Scrutinizer/felipe ,

I have been going through teh tutorials & they are very helpful thansk a lot!

However , I tried two slightly different statements on the same string & the o/p varied drastically , it is due to the position of "." in the statement .

e.x.1

echo "ABCDEFGHIJ20100909.txt" | sed -n -e 's/\(..\)\(...\).*\([0-9]\{8\}\).*/\1,\2,\3,&/p'

o/p : AB,CDE,20100909,ABCDEFGHIJ20100909.txt --> as desired

e.x.2

echo "ABCDEFGHIJ20100909.txt" | sed -n -e 's/\(..\)\(...\)*\([0-9]\{8\}\).*/\1,\2,\3,&/p'

o/p : ABCD,HIJ,20100909,CDEFGHIJ20100909.txt

Can anybody explain how the position of "." dot affects the result ?
thanks in advance .

The differente between the dot(.) and the asterisk(*) are the following:

. Matches any single character
(character)*match arbitrarily many occurences of (character)

Which means that the ()* matches "arbitrarily many occurences" and that's why the difference between the two results.

-Post deleted-

in above code its searching date patter from right to left ,but if i wana to search from left to right ?

please suggest i have tried but unable to do

Like this you mean?

$ echo XXXXXXXX20100909XXXXXXXXJIHGEFDABC.txt|sed 's/.*\([0-9]\{8\}\).*\(..\).\(...\).txt$/\3,\2,\1,&/'
ABC,EF,20100909,XXXXXXXX20100909XXXXXXXXJIHGEFDABC.txt

Hi ,

I have been trying various combinations of . & * but I have not yet completely understood how the placement of "." affects the o/p & can't use the [tag] sed [/tag] for this purpose very convincingly. following are some examples :


echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(...\)\([0-9]\{8\}\)/\2,\1,\3,&/p'
MTR,MP,20100706,MPMTR20100706043000.txt

echo MPMTR20100701043000.txt|sed -n -e 's/\(..\)\(...\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MP,MTR,20100701,MPMTR20100701043000.txt

echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(..\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MPM,TR,20100706,PMTR20100706043000.txt

echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(.\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MPMT,R,20100706,MTR20100706043000.txt

echo MPMTR20100706043000.txt|sed -n -e 's/\(.\)\(..\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MPM,TR,20100706,MTR20100706043000.txt

echo MPMTR20100706043000.txt|sed -n -e 's/\(\)\(.\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MPMT,R,20100706,R20100706043000.txt

echo MPMTR20100706043000.txt|sed -n -e 's/\(\)\(\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MPMTR,,20100706,20100706043000.txt

can anyone please explain why the filename is getting clipped from left & why first field is not of 2 characters ?

Thanks!

Sorry, but I didn't understand your problem! In which of the statements above is your problem?

sorry for the late response .

Please ignore the first two o/p s .

  1. In the 3rd o/p , even though I have selected first two matching occurrances it is selecting 3 characters

  2. If I reduce one more dot for second string it selects 4 character for 1st field & equal number of characters get clipped from the filename in o/p.

can you explain how this happens ?

Thanks ,
sumoka.

you can use cut,

main=`echo "ABCDEFGHIJ20100909.txt"`
first=`echo $main | cut -c1-2`
second=`echo $main | cut -c3-5`
third=`echo $main | cut -c11-18`
echo "$first,$second,$third,$main"

Best Luck.

In the third example this is because the first character is not part of the search and replace operation (and likewise aren't the last 10 characters). Look what happens when we do this:

$ echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(..\)\([0-9]\{8\}\)//p'
M043000.txt
$ echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(..\)\([0-9]\{8\}\).*//p'
M
$ echo MPMTR20100706043000.txt|sed -n -e 's/.\(..\)\(..\)\([0-9]\{8\}\).*//p'

$ echo MPMTR20100706043000.txt|sed -n -e 's/.\(..\)\(..\)\([0-9]\{8\}\).*/\1,\2,\3,&/p'
PM,TR,20100706,MPMTR20100706043000.txt

Because your coding has contains some mistakes..

echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(..\)\([0-9]\{8\}\)/\1,\2,\3,&/p'
MPM,TR,20100706,PMTR20100706043000.txt

your string is "two chars + two chars + 8 numbers"
but after the second "two chars" there is a char before the 8 numbers

so your code is must be is below

# echo MPMTR20100706043000.txt|sed -n -e 's/\(..\)\(..\).\([0-9]\{8\}\)/\1,\2,\3,&/p'
MP,MT,20100706,MPMTR20100706043000.txt