Identify string field in csv and enclose in double quotes

reach2khan · June 1, 2020, 9:35am

I have a tst file with the below content

tst.csv

1, Mike, 3000,5000, Mgr
2, 1000, 5000,7000, Sr. Mgr

I need the below o/p using gsub without any for loop, dont want to write any shell script as well.

Expected o/p

1, "Mike", 3000, 5000, "Mgr"
2, 1000, 5000, 7000,"Sr. Mgr"

RavinderSingh13 · June 1, 2020, 9:51am

Hello @reach2khan, On forums we do encourage users to do add their efforts which they have put in order to solve their own problems. So please do add your efforts/attempts in form of code in your question and let us know then.

Thanks,
R. Singh

reach2khan · June 1, 2020, 12:35pm

Tried the below cmd , but no o/p seems to print

awk '{gsub("/[[:alpha:]]+/","&&",$2);gsub("/[[:alpha:]]+/","&&",$5);print $0}' tst.csv

vgersh99 · June 1, 2020, 2:09pm

Wonder why that is? What's wrong with using a loop?
The subject of this thread is "Identify string field in csv ". How does your code achieve this "identification"?
How do you determine if a field contains ONLY alphabetic chars?
Does Mike 123 qualifies? Or Mike! ? Or anything with leading/trailing spaces (e.g.)?

awk '{gsub("/[[:alpha:]]+/","&&",$2);gsub("/[[:alpha:]]+/","&&",$5);print $0}'

What's the field separator in your sample?
What's the meaning of "/[[:alpha:]]+/"?
What do you think "&&" is supposed to do?

reach2khan · June 1, 2020, 4:36pm

Wonder why that is? What's wrong with using a loop?
Agree;we can go with the loop to identify if the field is a string - to make the code more generic.

How do you determine if a field contains ONLY alphabetic chars?
I am trying to check if 2nd and 5th field is alpha -numeric or not using [[:alpha:]]

Does Mike 123 qualifies? Or Mike! ? Or anything with leading/trailing spaces (e.g.)?
I am treating this field as alpha-numeric

What's the field separator in your sample?
it is comma - updated my command

What do you think "&&" is supposed to do?
I replaced with single & - the objective of adding "&" was to put the field that is read in quotes (as shown in the below cmd)

awk -v dq='"' '{gsub("/^[[:alpha:]]+/",dq"&"dq,$2);gsub("^/[[:alpha:]]+/",dq"&"dq,$5);print $0}' FS=, OFS=, tst.csv

But when i fire the above cmd - the 2nd and 5th fields are not quoted

vgersh99 · June 1, 2020, 4:49pm

alpha is not alphanumeric:

       [:alnum:]  Alphanumeric characters.
       [:alpha:]  Alphabetic characters.

awk -v dq='"' '{gsub("/^[[:alpha:]]+/",dq"&"dq,$2);....'

First of all your field 2 has a leading space ( Mike). Therefore your anchor (^) to the begining of the field fails to match.
Secondly, you're specifying the pattern to gsub incorrectly. It should be "^[[:alpha:]]+" - loose the slashes - they're taken literally inside the quotes. Or loose the quotes and leave the slashes - but not both.
So the code (the way you're implementing it) should look like:

awk -v dq='"' '{gsub("^[[:alpha:]]+",dq"&"dq,$2);....'

I'll let you consider other potential changes based on some of the questions/concerns raised above.

styx · August 11, 2020, 4:48pm

Hello

assuming your strings are in file 'file', and using octal representation to deal with double quotes (more easy)

This will need to be a little adapted if you want to conserve spaces before/after the comma (your sample seems to not be homogeneous on the format)

$ awk -F', ' 'BEGIN{OFS=","}{for(i=1;i<=NF;i++){if ($i~/[a-zA-Z]/){$i="\042"$i"\042"}}print}' file

1,"Mike",3000,5000,"Mgr"
2,1000,5000,7000,"Sr. Mgr"

nezabudka · August 11, 2020, 8:49pm

awk '{gsub(/\<[^,]*[^,0-9][^,]*\>/, "\"&\"")}1' tst.csv

gawk only:

awk '{printf ($0~/[^0-9]/?q$0q:$0) RT}' RS=', *|\n' q=\" tst.csv

MadeInGermany · August 11, 2020, 10:37pm

In awk the better delimiters for a regex are the slashes. Well, you have to \/ escape an embedded / but there are less escapes necessary, compared to quote delimiters.
In gsub() the first argument is the regex, the 2nd is the substitution string, and strings are always delimited by quotes. (BTW the 3rd argument is a variable that is modified, if omitted it puts $0 i. e. the whole input line. )

system · November 9, 2020, 10:37pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.