read string, check string length and cut

ozzy80 · March 20, 2007, 6:34pm

Hello All,

Plz help me with:

I have a csv file with data separated by ',' and optionally enclosed by "". I want to check each of these values to see if they exceed the specified string length, and if they do I want to cut just that value to the max length allowed and keep the csv format as it is.

Example:

csv file:

1,Test Name,"This is a test and is funny",,,1234

Value1: max(10)
Value2: max(8)
Value3: max(21)
Value4: max(5)
Value5: max(5)
Value6: max(5)

and the expected result is:

1,Test Nam,This is a test and is,,,1234

Plz help!

Thnx in advance!
~Ozzy

sb008 · March 20, 2007, 7:09pm

echo '1,Test Name,"This is a test and is funny",,,1234' | sed -e 's/"//g' -e 's/\([^,]\{0,10\}\)[^,]*,\([^,]\{0,8\}\)[^,]*,\([^,]\{0,21\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*/\1,\2,\3,\4,\5,\6/'

sed -e 's/"//g' -e 's/\([^,]\{0,10\}\)[^,]*,\([^,]\{0,8\}\)[^,]*,\([^,]\{0,21\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*/\1,\2,\3,\4,\5,\6/' <file>

ghostdog74 · March 20, 2007, 8:08pm

if you have Python, here's an alternative

#!/usr/bin/python
for line in open("csvfile"):
  line = line.strip().split(',')
  print "%s,%s,%s,%s,%s,%s" % (line[0][0:10] ,line[1][0:8],line[2].strip('"')[0:21],line[3][0:5],line[4][0:5],line[5][0:5])

output:

1,Test Nam,This is a test and is,,,1234

cfajohnson · March 21, 2007, 12:23am

awk -v lengths=10,8,21,5,5,5 '
BEGIN { FS = OFS = ","
        split(lengths,len,FS)
      }
      { n = 0
        while ( ++n <= NF ) $n = substr($n,1,len[n])
        print
      }' "$FILE"

ozzy80 · March 21, 2007, 10:48am

sb008:

echo '1,Test Name,"This is a test and is funny",,,1234' | sed -e 's/"//g' -e 's/\([^,]\{0,10\}\)[^,]*,\([^,]\{0,8\}\)[^,]*,\([^,]\{0,21\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*/\1,\2,\3,\4,\5,\6/'

sed -e 's/"//g' -e 's/\([^,]\{0,10\}\)[^,]*,\([^,]\{0,8\}\)[^,]*,\([^,]\{0,21\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,5\}\)[^,]*/\1,\2,\3,\4,\5,\6/' <file>

Thank you so much! But one last question probably...

my script looks like this....

#!/bin/ksh

echo "started at " $(date);

while read record
do
   echo $record | sed -e 's/"//g' -e 's/\([^,]\{0,12\}\)[^,]*,\([^,]\{0,35\}\)[^,]*,\([^,]\{0,35\}\)[^,]*,\([^,]\{0,35\}\)[^,
]*,\([^,]\{0,20\}\)[^,]*,\([^,]\{0,10\}\)[^,]*,\([^,]\{0,13\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,35\}\)[^,]*,\([^,]\{0,35\
}\)[^,]*,\([^,]\{0,13\}\)[^,]*,\([^,]\{0,5\}\)[^,]*,\([^,]\{0,31\}\)[^,]*,\([^,]\{0,75\}\)[^,]*,\([^,]\{0,180\}\)[^,]*/\1,\2,
\3,\4,\5,\6,\7,\8,\9,\"10",\"11",\"12",\"13",\"14",\"15"/' >> test_data_2.dat

done < test_email

echo "ended at " $(date)

exit;

and data in tets_email file is ...

97	Metro Packaging	160 Fornelius Ave	Clifton	NJ	7013	(973) 709-9100	289	Jack Bhohj	Steven Neal	(973) 709-9100	218			Call for appt between 0600 and 1400 M-F.  Leave MSG if get voicemail. 973-777-3999 Warehouse direct line for emergencies POC William Toro. Shipping 0800 to 1600 M to F only.
98	Anchor Glass & Container	151 East McCanns Blvd.	Elmira	NY	14903	(607) 737-1933	324	Bill Weston	Mike Sopp	(607) 737-1933	300			Shipping hours 0800 to 2200 M to F.
278	Tate & Lyle #0278	Rt. 4 950 Morning Star Rd.	Houlton	ME	4730	(207) 532-9523								Load Hours: 7AM-2:00PM (EST)      Requires Appt.
509	QUINCY PLANT	4551 SQUIRES ROAD	QUINCY	MI	49082	(517) 689-2391		Ed Loftis	Charlotte Laws	(517) 689-2391			edward.loftis@conagrafoods.com#http://edward.loftis@conagrafoods.com#	Charlotte Laws (2nd Shift Lead Person)  charlotte.laws@conagrafoods.com  (517-689-2391)
786	Tate & Lyle / Specialty Warehouse #0786	333 Blair Bend Dr.	Loudon	TN	37774	(865) 458-9585								Load Hours: 8AM-3:00PM (EST)      First come, first serve.
2243	Tate & Lyle / Distribution Center #2243	4464 E. 350 South	Lafayette	IN	47905	(765) 474-2512		Shipping						Load Hours: 7AM-6:30PM (EST)      First come, first serve.
2247	Tate & Lyle / McLeod Warehouse #2247	4988 Cundiff Circle	Decatur	IL	62526	(217) 877-9626

When I execute it, I get "sed garbled error".. plz help!!!

Thnx
~Ozzy

cfajohnson · March 21, 2007, 11:31am

Even if it worked, that sed command is still garbled. How can you realistically expect to debug something like that?

For complex scripts, do not use sed; see the awk script I provided eariler in the thread.

ozzy80 · March 21, 2007, 12:20pm

The awk is not working... says syntaz error!!!

Percy · March 21, 2007, 1:29pm

Well, I tried out the awk - and thought it might be a little easier to read (just to understand it better) if you try it like this:

1) create "search.awk" and inside, put this:

BEGIN { lengths="10,8,21,5,5,5"
        FS = OFS = ","
        split(lengths,size,FS)
      }
      { n = 0
        while ( ++n <= NF ) $n = substr($n,1,size[n])
        print
      }

2) save and then run like this:
"awk -f search.awk csv.file"

3) output is:
1,Test Nam,"This is a test and i,,,1234

...so the theory works. Except that it includes the " as a char.

Hope this is enough to set your mind off on the right track!

cfajohnson · March 21, 2007, 1:55pm

Are you running on Solaris?

If so, use nawk or /usr/xpg4/bin/awk.

ozzy80 · March 21, 2007, 5:56pm

Thnx guyz for all the help!!! will come back soon