Help with controlling string elements

pawannoel · March 13, 2011, 6:15am

Hi All,

I have a general difficulty in understanding how to control single elements within a string. An example,

XYZ1234      ABCD5678

My expected output is :

ABCD1234     XYZ5678  (swapping subset of string elements of choice)
XYZ37            ACBD1214 (making calculations using string elements)

,etc

Could someone illuminate on such problems please.

Thanks in advance

PS: I'm a beginner to UNIX

ctsgnb · March 13, 2011, 5:43pm

Read the following link about string operators and pattern matching operator :
String Operators (Learning the Korn Shell, 2nd Edition)

Then understand and reproduce the given examples.

$ a=/users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

$ echo $a
/users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

$ echo ${a#*/}
users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

$ echo ${a##*/}
1900_CDCRM_CBF71_13022010_13022010.txt

$ echo ${a%_*}
/users/home/ctsgnb/1900_CDCRM_CBF71_13022010

$ echo ${a%%_*}
/users/home/ctsgnb/1900

$ echo ${a%.*}
/users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010

$ echo ${a#?}
users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

$ echo ${a##?}
users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

$ echo ${a%?}
/users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.tx

$ echo ${a%%?}
/users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.tx

---------- Post updated at 10:42 PM ---------- Previous update was at 10:23 PM ----------

You also have to understand meaning of meta character in regular expression see
Syntax of sed Commands (sed & awk, Second Edition)

---------- Post updated at 10:43 PM ---------- Previous update was at 10:42 PM ----------

some examples :

[ctsgnb@shell ~]$ echo $a
/users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/_/#/'
/users/home/ctsgnb/1900#CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/_/#/g'
/users/home/ctsgnb/1900#CDCRM#CBF71#13022010#13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/_/#/2'
/users/home/ctsgnb/1900_CDCRM#CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/_/#/3'
/users/home/ctsgnb/1900_CDCRM_CBF71#13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/ctsgnb/toto/'
/users/home/toto/1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/.*/# &/'
# /users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/\(.*\)ctsgnb/ctsgnb\1/'
ctsgnb/users/home//1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/.*\(ctsgnb\)/\1/'
ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$

pawannoel · March 13, 2011, 7:22pm

Thank you very much ...

Its really exciting and your examples are helping me understand it nicely

If you think any other stuff is basic and will fit my level please let me know.

Thanks again and have a nice week ahead

---------- Post updated at 06:22 PM ---------- Previous update was at 06:13 PM ----------

Could you please comment on these last 3 sed command examples u sent,

didnt really get how they work !!

Cheers

[ctsgnb@shell ~]$ echo $a | sed 's/.*/# &/'
# /users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/\(.*\)ctsgnb/ctsgnb\1/'
ctsgnb/users/home//1900_CDCRM_CBF71_13022010_13022010.txt
[ctsgnb@shell ~]$ echo $a | sed 's/.*\(ctsgnb\)/\1/'
ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

mirni · March 13, 2011, 8:16pm

ctsgnb@shell ~]$ echo $a | sed 's/.*/# &/'
# /users/home/ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

replace all characters ('.' is any char, '' is zero or more) with
a hash sign, space and whatever you captured ('&') as .

[ctsgnb@shell ~]$ echo $a | sed 's/\(.*\)ctsgnb/ctsgnb\1/'
ctsgnb/users/home//1900_CDCRM_CBF71_13022010_13022010.txt

capture ('$' and '$' define the beg and end) any characters immediately before 'ctsgnb' string and
replace captured string and "ctsgnb" with "ctsgnb" concatenated with whatever was captured

[ctsgnb@shell ~]$ echo $a | sed 's/.*\(ctsgnb\)/\1/'
ctsgnb/1900_CDCRM_CBF71_13022010_13022010.txt

replace any chars before string "ctsgnb" and the string itself, with what was captured.
Only pattern that could match the capture is "ctsgnb" string.
Same as

echo $a | sed 's/.*ctsgnb/ctsgnb/'

without an instructional use of capturing with parentheses

ctsgnb · March 14, 2011, 4:48am

@mirni :

Yes, i used $ ...$ and \1 just for demonstration purpose.

@pawannoel :

when using

s/.../...&.../

the & refer to what has been matched in the first /.../ part (in red)

You could also play with the cut command and its options it may also be usefull in string manipulation in some case.

man cut

mirni · March 14, 2011, 5:12am

@pawannoel:
This kind of substitution may come in handy also:

  $ var=kofoo23.txt
 $ echo ${var/foo/bar}
kobar23.txt

Can also be done with nested vars:

  $ str=BAR
  $ echo ${var/foo/$str}
koBAR23.txt

For details look at 'man bash' , and search for 'parameter expansion'

@ctsgnb:
Which shell (and version) are you using? I can't seem to find the
echo ${a##?}
echo ${a#?}
expansions in 'man bash' on my system (GNU bash, version 4.1.5(1)-release (i486-pc-linux-gnu)), although it seems to work on CL. What do these do?

ctsgnb · March 14, 2011, 6:53am

That was ksh on a FreeBSD (not sure about the version, but it i think it was not a ksh93) the ? is expanded by the shell as matching any single character (but 1 and only 1) similar to (but not exactly same as) the dot in regex

---------- Post updated at 11:53 AM ---------- Previous update was at 11:27 AM ----------

@pawannoel :

still for demo purpose :

# echo "XYZ1234      ABCD5678"
XYZ1234      ABCD5678
# echo "XYZ1234      ABCD5678" | sed 's/^.../ABCD/;s/ABCD/XYZ/2'
ABCD1234      XYZ5678
# echo "XYZ1234      ABCD5678" | sed 's/^\(...\)\([^A-Z]*\)\(....\)\(.*\)$/\3\2\1\4/'
ABCD1234      XYZ5678
# echo "XYZ1234      ABCD5678" | sed 's/^\([A-Z]*\)\([0-9]*\)\([[:blank:]]*\)\([A-Z]*\)\([0-9]*\)$/\4\2\3\1\5/'
ABCD1234      XYZ5678
#