Help in shell script

Hi,

I have a requirement in which i have strings in a flat file separated by pipe character as delimiter (|)
All strings are enclosed in double quotes

This delimiter can also be within the data string
I need to remove the double quotes from those data strings having the same delimiter within itself as the actual delimiter

For eg If the flat file has strings like
"abc"|"xyz"|"def|ghi"

I want to keep the last string ie ""def|ghi" intact
and remove the double quotes from remaining strings in the file which do not have a delimiter in between them
so my o/p will now look like

abc|xyz|"def|ghi"

Thnx in advance

Try

#! /bin/sh
awk 'BEGIN { FS="|"; OFS="|" }
{
gsub( /"/ , ""  , $1 ); gsub( /"/, "" , $2 )
print $1, $2, $3, $4
}' testfile

Hi shereenmotor,

My FF has n number of strings so how do I generalise it ?

Try this:

#! /bin/sh

awk 'BEGIN { FS="|" }
{
col1=length($1); col2=length($2)
wid=col1+col2+1
col=substr($0,1,wid)
gsub( /"/, "", col )
print col, substr($0,wid+1)
}' testfile
awk '
{
n = split($0,parts,"\"")
for (k = 1; k <= n; k++)
{
	if( match ( parts[k] , "|" ) && length( parts[k] ) >1 )
		printf("\"%s\"",parts[k]);
	else
		printf("%s",parts[k]);
}
printf("\n");
}' file

Just an alternative , in Python:

all = open("input.txt").read()
t = all.split('"')
t.pop(0) #remove first element
t.pop(-1) #remove last element
for num,i in enumerate(t):
 	if "|" in i and i != "|":
 		t[num] = '"' + i + '"'
print ''.join(t)

Input.txt:
"abc"|"xyz"|"def|ghi"
"abc"|"def|opq"|"def|xyz"

Output:

c:\> python test.py
abc|xyz|"def|ghi"
abc|"def|opq"|"def|xyz"

Thanx anbu23 your solution works perfectly fine

Hi

if now I have to include another delimiter in addition to "|"
like " for eg . how do i include in the code?

Now my requirement is like

"stri"n"g" should give as
stri"n"g

The following code works fine for 1 demilter but does not work for " character

awk '
{
n = split($0,parts,"\"")
for (k = 1; k <= n; k++)
{
if(( match ( parts[k] , "|" ) && length( parts[k] ) >1 ) || ( match ( parts[k] , "\"" ) && length( parts[k] ) >1))
printf("\"%s\"",parts[k]);
else
printf("%s",parts[k]);
}
printf("\n");
}' <fine_name>

sed 's/"|/ /g' f1 | sed 's/"\([^ ]* \)/\1/g' | sed 's/ /|/g'|sed 's/|"\([^"|]*\)"/|\1/'

input:
"abc"|"xyz"|"def|ghi"
"abc"|"def|opq"|"def|xyz"
"stri"n"g"|"anb"

output:
abc|xyz|"def|ghi"
abc|def|opq|"def|xyz"
stri"n"g|anb

Hi Anbu23

If the string
"stri"n"g" should be left unchanged then what is the change required

as all quotes within quotes should be intact as per requirement

secondly the sed command has overwritten the first basic requirement
ie. pipe(\) enclosed within strings should be untouched
"abc|def" should be unchanged

check the above output. that is what u needed right? if you have problems in sed then show the input and output of sed command

Hi,

The o/p of sed command is

abc|xyz|abc|xyz|xyz|abc|abc|xyz|tt"t"tt|pppppp
abc|xyz|abc|xyz|xyz|abc|abc|xyz|ttttt|pppppp

input was

"abc"|"xyz"|"abc|xyz"|"xyz"|"abc"|"abc|xyz"|"tt"t"tt"|"pppppp"
"abc"|"xyz"|"abc|xyz"|"xyz"|"abc"|"abc|xyz"|"ttttt"|"pppppp"

required o/p

abc|xyz|"abc|xyz"|xyz|abc|"abc|xyz"|"tt"t"tt"|pppppp
abc|xyz|"abc|xyz"|xyz|abc|"abc|xyz"|ttttt|pppppp

chk this

sed 's/"|"/" "/g' f1 | 
awk ' BEGIN { OFS="|"}
{
for(i=1;i<=NF;++i)
 if(match($i,/^\"[^|"]*\"$/)) 
	gsub(/\"/,"",$i)
print
}'

Hi Anbu23,

The solution fails in the follg test case

input

"abc"|"xyz"|"abc|xyz"|"xyz"|"abc"|"abc|xyz"|"tt"t"tt"|"ppp ppp"

o/p

"abc"|"xyz"|"abc|xyz"|"xyz"|"abc"|"abc|xyz"|"tt"t"tt"|"ppp|ppp"

required o/p

"abc"|"xyz"|"abc|xyz"|"xyz"|"abc"|"abc|xyz"|"tt"t"tt"|ppp ppp

In sed i used [SPACE] replacement because i dont find any spaces in the input which you shown us.
can you show all the different test cases you have?

Space is a valid input in the string and along with space
all special characters are also valid
for eg &,*,@,#,$ ,!,~ etc

try this

sed 's/ /@#/g' f1|  sed 's/"|"/" "/g' | 
awk '
{
for(i=1;i<NF;++i)
        if(match($i,/^\"[^|"]*\"$/)) 
        {
                gsub(/\"/,"",$i)
                gsub(/@#/," ",$i)
                printf("%s|",$i)
        }
        else
        {
                gsub(/@#/," ",$i)
                printf("%s|",$i)
        }
if(match($NF,/^\"[^|"]*\"$/)) 
{
        gsub(/\"/,"",$NF)
        gsub(/@#/," ",$i)
        printf("%s\n",$NF)
}
else
{
        gsub(/@#/," ",$NF)
        printf("%s\n",$NF)
}
}'

make sure you don't have any pattern like @# is there in your input. If it is there then make some pattern which is not your input and replace @# in above code with your pattern.

Consider the following input :

"abc"|"123"|"456"|"def"

How many field is there ? :confused:
Four ?

  1. abc
  2. 123
  3. 456
  4. def

or Three ?

  1. abc
  2. 123"|"456
  3. def

If the answer is 4, how is coded the last case ?
And if the answer is 3, how to make the difference between the two cases ?

Jean-Pierre.

Number of fields is 4.

Do you mean the following case?

Convert space to some pattern and after processing the fields in awk replace that pattern with the space.

Hi Anbu23

Your solution works fine

but in another application requirement is almost similar

with slight modifications the delimiter is the same ie pipe (|)

1) The record may have several null columns
2) strings with just opening and close quotes should be removed ,
if they don't have a pipe in between the quotes else quotes should remain as it is
3) If there are more than 2 quotes in the string between delimiters then there should be no change else remove the quotes

eg INPUT STRING

|||||||"USA"||"AB|C"||""XYZ"A"BCD"||||""PPPPP""|"LLLLLL"

REQUIRED O/P STRING

|||||||USA||"AB|C"||""XYZ"A"BCD""||||""PPPPP""|LLLLLL