sed cannot execute [Argument list too long]

galaxy_rocky · April 15, 2014, 6:09am

Hi All,

This question has been asked many times, but my problem is slightly different.

In my shell script i am connecting to oracle database and loading the results to .dat file. This .dat file is later used to create to .xls file[to be sent to the users] Some times the size of .dat file becomes more than 120000 bytes. In these cases, the sed and other commands used in the script fails with cannot execute [Argument list too long] error.

I tried splitting the .dat file to smaller files, but i am not even able to split the file and it again throws me the same [Argument list too long] error.

The problem is not just with sed, but all commands used in the shell script.

My piece of code :

cat tgt_type_${tmp_file_suffix}i.dat | while read line
do
 
export FETCH=`sqlplus -silent ${USERNAME}/${PASS}@${DATABASE} <<EOF
whenever sqlerror exit 1
whenever oserror exit 1
set pagesize 2000
set linesize 2000
set underline off
set feedback off
set heading off
set trim on
spool missing_$line_${tmp_file_suffix}i.dat 
/*Oracle sql query comes here */;
spool off
EOF`
fi
 
sed  /^$/d  *.dat
cat missing_$line_${tmp_file_suffix}i.dat | while read line
do
#
#coding
#coding
#
done

done

CarloM · April 15, 2014, 7:28am

What does ls -l *.dat return? The same error? If so, that's the number of files, not their size - look at using man find ("posix") and possibly man xargs ("posix"). E.g. something like:

find . -type f -name "*.dat" | xargs sed /^$/d

Incindetally, sed /^$/d *.dat won't do anything to the actual files (unless you've aliased sed). Is that your intention?

ygemici · April 15, 2014, 8:28am

Because of the kernel ARG_MAX value is limited and asterisk character expands the all matching files,directory,links or any other forms.
Therefore your bash(shell) command argument buffer's ( commandline length ) is fill and and then overflow.

You can use

for loop
while loop
or any scriptable methods
split the arguments to meaning values

ls -1 tgt_type_${tmp_file_*}.dat

xargs (to split to the arguments with find is one good method )
ls (use the folder hierarchy with asteriks )

ls -1 yourfolder/

find (use the folder hierarchy without asteriks )

find /yourfolder -name '*'

For example :

while read myfile ; do sed '/^$/d' $myfile ; done | find /yourdolfer -name '*.dat'|xargs -n1

# (use the while loop and read the arguments one by one (-n1) )

find /yourdolfer -name '*.dat' -print0 | xargs -0 -n5 sed '/^$/d'

# (use the spaces instead of newlines with xargs and use getdents to get the file entries with find)

and consider the @CarloM 's notice..
your sed command's default output is your stdout (usually your screen tty,pts,console,...) ,
therefore you must redirect to another file for save the changes or use the '-i' parameter , if you have gnu sed for save to your file same time.

regards
ygemici

galaxy_rocky · April 16, 2014, 1:46am

Hi ygemici, I am trying to implement your suggestions and i have made couple of changes to my script to use while loop and read from file.
But i am stuck in the below section:

cat missing_$line_${tmp_file_suffix}i.dat | while read line
do

echo "<Row>" >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$1F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$2F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$3F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$4F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$5F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$6F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$7F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$8F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$9F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$10F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$11F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$12F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$13F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo "</Row>" >> excel_CT_${tmp_file_suffix}i.dat
done
echo "</Table>" >> excel_${tmp_file_suffix}i.dat
echo "</Worksheet>" >> excel_${tmp_file_suffix}i.dat
done
echo "</Workbook>" >> excel_${tmp_file_suffix}i.dat

the above section basically echoes the contents of file and sets up the excel file to be created. Any suggestions on this?

Regards,

ygemici · April 16, 2014, 6:32am

galaxy_rocky:

Hi ygemici, I am trying to implement your suggestions and i have made couple of changes to my script to use while loop and read from file.
But i am stuck in the below section:

cat missing_$line_${tmp_file_suffix}i.dat | while read line
do

echo "<Row>" >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$1F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$2F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$3F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$4F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$5F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$6F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$7F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$8F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$9F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$10F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$11F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$12F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo $line | awk -F';' '{print("<Cell><Data ss:Type=\"String\">"$13F"</Data></Cell>")}' >> excel_${tmp_file_suffix}i.dat
echo "</Row>" >> excel_CT_${tmp_file_suffix}i.dat
done
echo "</Table>" >> excel_${tmp_file_suffix}i.dat
echo "</Worksheet>" >> excel_${tmp_file_suffix}i.dat
done
echo "</Workbook>" >> excel_${tmp_file_suffix}i.dat

the above section basically echoes the contents of file and sets up the excel file to be created. Any suggestions on this?

Regards,

Hi galaxy_rocky,
if you can write the input file (or sample) and desired output file maybe i can help more

firstly i dont understand exactly what is the desired output ? (what is your $1F , is it fields that in te your input lines plus "F" )

i try some like below.

# cat missing.dat
test1
test2;test3
test4;test5;test6

# awk -F';' -vi=1 'BEGIN{print "<Row>"}{print "<Cell><Data ss:Type=\"String\">"$(i++)"F</Data></Cell>"}
END{print "</Row>\n</Table>\n</Workbook>"}' missing.dat
<Row>
<Cell><Data ss:Type="String">test1F</Data></Cell>
<Cell><Data ss:Type="String">test3F</Data></Cell>
<Cell><Data ss:Type="String">test6F</Data></Cell>
</Row>
</Table>
</Workbook>

galaxy_rocky · April 16, 2014, 6:49am

Hi ygemici,

$1F is the first field from the first line of the input file. Similarly $2F is the the second field from the first line of input file. This should loop for all the lines in the input file. Totally there are 13 fields in the input file. That is the reason i am using 13 awk commands!

My sample input file looks like below:

SG0010001;TRANSACTION;24:.ENTRY;31-MAR-14;SG0010001;123456789.020001;TRANSACTION;123;Metal XYZ Xy;N/A;N/A;N/A;CD_VALUE_TST;
SG0010001;TRANSACTION;24:.ENTRY;31-MAR-14;SG0010001;789456123.020001;TRANSACTION;123;Metal XYZ Xy;N/A;N/A;N/A;CD_VALUE_TST;
SG0010001;TRANSACTION;24:.ENTRY;31-MAR-14;SG0010001;147852369.040001;TRANSACTION;123;Metal XYZ Xy;N/A;N/A;N/A;CD_VALUE_TST;
SG0010001;TRANSACTION;24:.ENTRY;31-MAR-14;SG0010001;159753456.020001;TRANSACTION;123;Metal XYZ Xy;N/A;N/A;N/A;CD_VALUE_TST;

And my sample expected output is :

<Worksheet ss:Name="CD_VALUE_TST">
<Table>
<Row>
<Cell><Data ss:Type="String">EC</Data></Cell>
<Cell><Data ss:Type="String">TARGET_TRANSLATION</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_FILE_NAME</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_EFF_FROM</Data></Cell>
<Cell><Data ss:Type="String">CO_CODE</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_ID</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_CODE_SET_1</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_FIELD_1</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_FIELD_1_DESC</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_CODE_SET_2</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_FIELD_2</Data></Cell>
<Cell><Data ss:Type="String">SOURCE_FIELD_2_DESC</Data></Cell>
<Cell><Data ss:Type="String">TARGET_CODE_SET</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">SG0010001</Data></Cell>
<Cell><Data ss:Type="String">TRANSACTION</Data></Cell>
<Cell><Data ss:Type="String">24:.ENTRY</Data></Cell>
<Cell><Data ss:Type="String">31-MAR-14</Data></Cell>
<Cell><Data ss:Type="String">SG0010001</Data></Cell>
<Cell><Data ss:Type="String">123456789.020001</Data></Cell>
<Cell><Data ss:Type="String">TRANSACTION</Data></Cell>
<Cell><Data ss:Type="String">123</Data></Cell>
<Cell><Data ss:Type="String">Metal XYZ Xy</Data></Cell>
<Cell><Data ss:Type="String">N/A</Data></Cell>
<Cell><Data ss:Type="String">N/A</Data></Cell>
<Cell><Data ss:Type="String">N/A</Data></Cell>
<Cell><Data ss:Type="String">CD_VALUE_TST</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">SG0010001</Data></Cell>
<Cell><Data ss:Type="String">TRANSACTION</Data></Cell>
<Cell><Data ss:Type="String">24:.ENTRY</Data></Cell>
<Cell><Data ss:Type="String">31-MAR-14</Data></Cell>
<Cell><Data ss:Type="String">SG0010001</Data></Cell>
<Cell><Data ss:Type="String">789456123.020001</Data></Cell>
<Cell><Data ss:Type="String">TRANSACTION</Data></Cell>
<Cell><Data ss:Type="String">123</Data></Cell>
<Cell><Data ss:Type="String">Metal XYZ Xy</Data></Cell>
<Cell><Data ss:Type="String">N/A</Data></Cell>
<Cell><Data ss:Type="String">N/A</Data></Cell>
<Cell><Data ss:Type="String">N/A</Data></Cell>
<Cell><Data ss:Type="String">CD_VALUE_TST</Data></Cell>
</Row>
</Table>
</Worksheet>

Basically the fields in the input value are seperated by ";". Based on the number of records in the input file, there should be corresponing entries in the output file.
Please help on this issue.

Regards,

CarloM · April 16, 2014, 7:22am

If every field is treated the same then you can do the Cell tags fairly simply in one awk command. Something like:

echo "$line" | awk -F';' '{ for (i=1;i<NF;i++) { print "<Cell><Data ss:Type=\"String\">" $i "</Data></Cell>"}}' >> excel_${tmp_file_suffix}i.dat

EDIT: Actually, you can do away with the while as well:

awk -F';' '{ printf "<Row>\n"; for (i=1;i<NF;i++) { print " <Cell><Data ss:Type=\"String\">" $i "</Data></Cell>"} printf "</Row>\n"}' missing_$line_${tmp_file_suffix}i.dat >> excel_${tmp_file_suffix}i.dat

ygemici · April 17, 2014, 4:12am

Did you try Carlom's solution
or try this.. good lucks

awk -F';' 'BEGIN{print "<Row>"}{for(i=1;i<=NF;)print "<Cell><Data ss:Type=\"String\">"$(i++)"</Data></Cell>"}
END{print "<Row>\n<Table>\n</Workbook>"}' missing_$line_${tmp_file_suffix}i.dat

regards
ygemici

galaxy_rocky · April 18, 2014, 5:33am

Hi CarloM and ygemici !! You guys are amazing geniuses !!! Your suggestions worked for me... thanks a lot both of u !!

Regards,