Use case insensitive variable in ksh shell scripting using sed or awk

johnjs · June 18, 2012, 12:55am

I am using a variable called $variable in a pattern search to print from a starting variable to a constant value. the variable search should be case in sensitive.

i tired using Ip at the end in the below command. but in ksh it is not working.

sed -n "/$variable/,/constant/p" file

i also tried the below awk commad using toupper() function.. it works only for only one case...

balajesuri · June 18, 2012, 2:06am

perl -ne '(/'$variable'/../constant/) && print' file

Scrutinizer · June 18, 2012, 3:21am

The -I flag is only in GNU sed and only with the substitute command, not with a range match.. Try:

awk 'tolower($0)~tolower(s),/constant/' s="$variable" file

note: if $variable contains regex expressions then this may not work as expected..

johnjs · June 18, 2012, 8:01am

Both the solutions are not working..

I tried

awk 'tolower($0) ~ /'$variable'/,/endpattern' filename

but is working only if we give the input variable in lowercase and the actual pattern in file is uppercase. i want to search the file for both uppercase and lowercase irrespective of the input case we are giving.

Scrutinizer · June 18, 2012, 8:17am

Could you post a sample of your input, desired output and the content of $variable and your OS and version?

jawsnnn · June 18, 2012, 8:25am

Can you please let me know if this works:

sed -n "/"$(awk 'index(toupper($0),toupper("'$variable'"))>0{print $0; exit}' file1)"/,/constant/p" file1

bakunin · June 18, 2012, 9:15am

There are some questions i have, because your object wasn't completely clear to me:

You want to search case-insensitive, but what about the output? Will it matter in which case it is or should the original capitalization be preserved?

If you don't care about the capitalization it is relatively easy: transform your search string to lower-case first (you can do this via a mechanism built-in to the shell), then, prior to searching, modify the patternspace to contain only lower-case before searching. Notice that the following scripts are just sketches - no effort was spent on parameter validation, error handling, etc.:

#!/usr/bin/ksh

typeset -l search="$1"                  # get commandline argument and convert to lower-case

sed -n 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
        /'"$search"'/,/constant/p' /path/to/inputfile

exit 0

The first sed-command converts everything to lowercase, after this the resulting pattern space is searched (and printed if in range) in the second line.

If you want the search to be case-insensitive but the output should still match the original input it gets a little more complex because we have to use the hold space as temporary buffer. We copy the pattern space to the holding space, perform the tolower-line on the pattern space, only then compare - and if we have a match, we move the hold space to pattern space, so that the line in its originally read form is printed.

#!/usr/bin/ksh

typeset -l search="$1"                  # get commandline argument and convert to lower-case

sed -n 'h
        y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
        /'"$search"'/,/constant/ {
             g
             p
        }' /path/to/inputfile

exit 0

Now for my second question: is there always only one range to be found in your file or could there be several? If there might be several the solution provided should work, but if there is only one possible to be found the script bears potential for optimization:

You see, "sed" works this way: it reads the first line of input, then applies one line of the script after the other (branching commands, etc., of course, apply) until it reaches the end of the script. Then the next line of input is read and the process is repeated. This continues until the last line is read and the script is applied for a last time, then sed finishes.

Now, suppose you have a file with 100 lines and your range (the only one) sits in line 20-30. If you could just quit after line 30 you could save a lot of processing time, yes? OK, lets do exactly that:

#!/usr/bin/ksh

typeset -l search="$1"                  # get commandline argument and convert to lower-case

sed -n 'h
        y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
        /'"$search"'/,/constant/ {
             /constant/ {
                 g
                 p
                 q
             }
             g
             p
       }' /path/to/inputfile

exit 0

The difference is we first define a new range inside our printing range. One which precisely matches the last line of the range. When we encounter this line we do as with the other lines in the range (copy hold space to pattern space, then print) but then we immediately quit, preventing the rest of the file to be processed.

I hope this helps.

bakunin

ygemici · June 18, 2012, 9:39am

johnjs:

Both the solutions are not working..

I tried
awk 'tolower($0) ~ /'$variable'/,/endpattern' filename
but is working only if we give the input variable in lowercase and the actual pattern in file is uppercase. i want to search the file for both uppercase and lowercase irrespective of the input case we are giving.

what is your output?

@Scrutinizer's code is correct to me..
because of this code converts to lower (or upper ) for any input (does not matter if it is lower case or upper)
and if it matches then write..

# cat file
Test
first
1
2
3
endpattern
Test1
Test2
FIRST
4
5
6
endpattern
Test3
Test4
FiRsT
1
2
3
endpattern
XXXXXX
XXXXXXXX

for exa your input variable "first"..

# awk 'toupper($0)~toupper(s),/endpattern/' s="first" file
first
1
2
3
endpattern
FIRST
4
5
6
endpattern
FiRsT
1
2
3
endpattern

for exa your input variable "FIRST"

# awk 'tolower($0)~tolower(s),/endpattern/' s="FIRST" file
first
1
2
3
endpattern
FIRST
4
5
6
endpattern
FiRsT
1
2
3
endpattern

if you use the solaris then you must try the with the nawk or /usr/xpg4/bin/awk

methyl · June 18, 2012, 10:02am

Lateral thought method: Find out the line numbers containing each case-insensitive match to $variable and use sed to display from that line number until the the next occurence of endpattern. Does not work if $pattern just contains a number.

cat -n file | grep -i "${variable}" | awk '{print $1}' | while read n
do
     sed -n "$n,/endpattern/ p" file
done

Grossly inefficient for large files and/or where there are many hits.

johnjs · June 18, 2012, 11:16am

Thanks Bakunin.. Please find the sample case below. if i have a file like below

FIRST:\
one
two
three
end:

second:\
one
two
three
end:

case 1: If i gave search variable as "first" then it should match and print the o/p like below

FIRST:\
one
two
three
end:

case 2: If i gave search variable as "SECOND" then also it should match and print the o/p like below

second:\
one
two
three
end:

case 3: If i gave search variable as "FIRST" then it should match and print the o/p like below

FIRST:\
one
two
three
end:

case 4: If i gave search variable as "second" then also it should match and print the o/p like below

second:\
one
two
three
end:

Scrutinizer · June 18, 2012, 11:27am

Hi, I get:

$ variable=first; awk 'tolower($0)~tolower(s),/end:/' s="$variable" infile
FIRST:\
one
two
three
end:
$ variable=SECOND; awk 'tolower($0)~tolower(s),/end:/' s="$variable" infile
second:\
one
two
three
end:
$ variable=FIRST; awk 'tolower($0)~tolower(s),/end:/' s="$variable" infile
FIRST:\
one
two
three
end:
$ variable=second; awk 'tolower($0)~tolower(s),/end:/' s="$variable" infile
second:\
one
two
three
end:

What do you get?

--
On Solaris use /usr/xpg4/bin/awk rather than awk

johnjs · June 18, 2012, 11:38am

awk 'tolower($0)~tolower(s),/endpattern/' s="FIRST" file

the baove comamd works..
thanks for all your help..