Change value for POSIX

Abhayman · August 30, 2017, 10:57am

Hi,

I have a VM with following configration .

3.10.0-693.1.1.el7.x86_64 #1 SMP Thu Aug 3 08:15:31 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

My current POSIX is :--

Your environment variables take up 2011 bytes
POSIX upper limit on argument length (this system): 2093093
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2091082
Size of command buffer we are actually using: 131072

I want to change the value for POSIX smallest allowable upper limit on argument length (all systems): 4096 to 1 MB.

How do we get this changed ?

Regards.

MadeInGermany · August 30, 2017, 12:38pm

Why do you want to change this reasonable limit?

Don_Cragun · August 30, 2017, 3:28pm

You come up with a convincing argument explaining why no system conforming to the standard should be allowed to have an ARG_MAX limit less than 1048576, you come up with a convincing argument explaining why small memory model applications should not be allowed to run on standards-conforming systems on hardware using x86 compatible CPU architectures, you come up with convincing arguments why any other systems and features affected by your changes on those systems that currently conform to the standards should not be allowed to still be considered standard-conforming when a revision of the standard is approved that includes your desired changes, and then you file a change request for the standard at the POSIX Standards bug reporting site asking for the _POSIX_ARG_MAX limit in <limits.h> in the Base Definitions and Headers category to be changed whatever limit you think you can get the member companies of the Open Group's Base Working Group who will vote on the next revision of the Single UNIX Specification, the member countries of ISO who will vote on the next revision of ISO 9945 Standard, AND the individual members of the IEEE Standards Association who will vote on the next revision of the IEEE 1003.1 Standard will all approve.

The next revision of these three linked standards is expected to be approved sometime around 2020 or 2021. Then you will need to wait for companies who build operating systems to release updates that conform to the new standard and you will need to use one of those conforming products. Other operating systems might or might not increase the _POSIX_ARG_MAX limit even though they do not conform to other requirements of that standard. (Note that no Linux distribution has yet claimed to conform to any version of the POSIX standards.)

Abhayman · August 31, 2017, 1:14am

Actually I have 500K arguments to passed . The rough length of each argument is roughly 23 characters . so best i can pass is roughly 175 arguments and each loop . This is making the execution very slow as I was planning to pass roughly 2K per loop. Hence, wanted to bypass the POSIX limit .

Don_Cragun · August 31, 2017, 1:45am

You have not given any reason why _POSIX_ARG_MAX should be changed in the standards. Why are you trying to restrict your code to the smallest argument list size guaranteed to be available on every system instead of using the resources that your system tells you are available on your system? There is no reason to restrict yourself to the value of _POSIX_ARG_MAX in <limits.h>. Just restrict yourself to the value of ARG_MAX in <limits.h> on your system (which you have shown us is 2093093 bytes, approximately 2Mb - 4Kb). Or, assuming that there aren't huge variations in the size of each argument, just use 250k arguments on 2 invocations instead of 175 arguments on a little under 3000 invocations.

Or, if you're using shell instead of C to invoke commands with varying numbers of arguments based on the limits of your system, find ... -exec utility initial_arg... {} + and xargs are perfectly capable of performing these calculations without your script needing to care about the limits.

Abhayman · August 31, 2017, 2:14am

Hi, The reason why I am posting it under POSIX smallest allowable upper limit on argument length (all systems): 4096 is because , I am trying to run a mongo query from shell. When I run the query directly on Mongo it works fine but when I run it from shell script it is taking max of 4096 characters only in argument list . So, I am forced to sent it in batches of 175 . I checked with Mongo team and they confirmed it is POSIX limitations of shell which is causing it and nothing to do with Mongo

Don_Cragun · August 31, 2017, 3:13am

What shell are you using? If you were using a POSIX-conforming shell (such as ksh or bash ), you would not be limited to an argument list maximum of _POSIX_ARG_MAX on a system where ARG_MAX is about 500 times larger.

How are you constructing the argument list to be passed to Mongo ? What are the arguments to Mongo ? Please show us the shell script you are using to gather arguments and invoke Mongo .

Abhayman · August 31, 2017, 3:54am

Hi,

I am using Bash shell.

I have set of Ids stored in a file and I am copying them in batches in array.

# Store Distinct ContainerIds  in Array
distinct_array=`sed ':a;N;$!ba;s/\n/ /g' output/EmployeeId_distinct.txt`
declare -a arr=($distinct_array)
echo " Total Number of Distinct Ids Stored in Array ${#arr[@]}"
batchsize=200

This is how I am calling the mongo

for((i=0; i < ${#arr[@]}; i+=batchsize))
 do
   IFS=,
   part=( "${arr[@]:i:batchsize}" )
  
   sed -i  "2i permissibleCars = [  ${part
[*]} ]"  query/employee_Id_count.js
   mongo localhost:27045/employee_db -u user_user -p password123 < query/employee_Id_count.js >> output/employee_Id_count.txt 
   cat query/employee_Id_count.js >> query/employee_Id_count_total.js
done

My sample employee id would be like "XYZ:16772767:586748411"

mployee_Id_count.js file :--

DBQuery.shellBatchSize = 224361901 ; 
permissibleCars = ["XYZ:16772767:58675748411" ..... more 200 ids ]
db.getCollection('employee_contracts_nrt').aggregate([
{$match:
      {         employeeClass : "ownload",
                    EmployeeId: {"$in": permissibleCars},
                                "methods.name": "image",
                "methods.status": "ACTIVE"

        } },
{"$group" : {_id:"$EmployeeId", count:{$sum:1}}}
],
{ allowDiskUse: true}
);

It works perfectly fine if I pass say 150 batchsize but if crosses 4096 bytes it starts failing.

RudiC · August 31, 2017, 4:20am

I'm not sure I fully understand what you're doing, but with IDs in a file to be transferred - unaltered, as far as I can see - into a .js script (which, btw, seems to be growing in every loop?), why do you do this via shell variables, arrays, operations and not with simple text tools / operations?

Abhayman · August 31, 2017, 4:51am

The file size is not a issue . Can you explain more about file tool. I am not much aware of it

RudiC · August 31, 2017, 5:21am

I wasn't talking about file size, but you seem to be adding (NOT replacing!)

permissibleCars = ["XYZ:16772767:58675748411" ..... more 200 ids ]

in every loop, making employee_Id_count.js look like

DBQuery.shellBatchSize = 224361901 ;
permissibleCars = ["XYZ:16772767:58675748411" ..... more 200 ids ]
.
.
.

in the first run,

DBQuery.shellBatchSize = 224361901 ;
permissibleCars = [ 200 other ids ]
permissibleCars = ["XYZ:16772767:58675748411" ..... more 200 ids ]
.
.
.

in the second etc.

*nix has a wealth of text tools to manipulate, extract, modify, adapt text files, to name a few of which : awk , sed , cut , sort , etc. (see e.g. the coreutils package).
If you post sample data like the EmployeeId_distinct.txt file and how it should fit into the employee_Id_count.js , people in these forums will likely come up with ptoposals on how to do it.

Abhayman · August 31, 2017, 5:48am

No, If you see it is not append it is insert . So, everytime the new record gets replaced.

---------- Post updated at 03:18 PM ---------- Previous update was at 03:05 PM ----------

Hi,

I have uploaded the EmployeeId_distinct.txt file at below location :--

Dropbox - EmployeeId_distinct.txt

The employee_Id_count.js would be

DBQuery.shellBatchSize = 224361901 ; 
permissibleCars = This will get passed from array .
db.getCollection('employee_details').aggregate([
{$match:
      {         employeeClass : "ownload",
                    EmployeeId: {"$in": permissibleCars},
                                "methods.name": "image",
                "methods.status": "ACTIVE"

        } },
{"$group" : {_id:"$EmployeeId", count:{$sum:1}}}
],
{ allowDiskUse: true}
);

RudiC · August 31, 2017, 6:25am

Sure?

sed -i '2i XXX' /tmp/file2.txt
cat /tmp/file2.txt
21 1209
XXX
XXX
XXX
22 1210

after applying the sed command thrice ...

And, giving approximate results like

will help only if you yourself know exactly how to make them fit exactly. Do you?

Abhayman · August 31, 2017, 6:33am

Basically it is something like this

{ echo "DBQuery.shellBatchSize = $employee_count ; "; cat query/employee_cemployeidd_count_tmp.js; } > query/employee_Id_count.js
    sed -i  "3i permissibleCars = [  ${part[*]} ]"  query/mployee_Id_count.js.js

But I have just posted the final one which is having issue . Sorry for the confusion.

Issue is more related to final mongo execution.

Don_Cragun · September 1, 2017, 12:13am

abhayman:

Hi,

I am using Bash shell.

I have set of Ids stored in a file and I am copying them in batches in array.

# Store Distinct ContainerIds  in Array
distinct_array=`sed ':a;N;$!ba;s/\n/ /g' output/EmployeeId_distinct.txt`
declare -a arr=($distinct_array)
echo " Total Number of Distinct Ids Stored in Array ${#arr[@]}"
batchsize=200

This is how I am calling the mongo

for((i=0; i < ${#arr[@]}; i+=batchsize))
 do
   IFS=,
   part=( "${arr[@]:i:batchsize}" )
  
   sed -i  "2i permissibleCars = [  ${part
[*]} ]"  query/employee_Id_count.js
   mongo localhost:27045/employee_db -u user_user -p password123 < query/employee_Id_count.js >> output/employee_Id_count.txt 
   cat query/employee_Id_count.js >> query/employee_Id_count_total.js
done

My sample employee id would be like "XYZ:16772767:586748411"

mployee_Id_count.js file :--

DBQuery.shellBatchSize = 224361901 ; 
permissibleCars = ["XYZ:16772767:58675748411" ..... more 200 ids ]
db.getCollection('employee_contracts_nrt').aggregate([
{$match:
   {         employeeClass : "ownload",
   EmployeeId: {"$in": permissibleCars},
   "methods.name": "image",
   "methods.status": "ACTIVE"

   } },
{"$group" : {_id:"$EmployeeId", count:{$sum:1}}}
],
{ allowDiskUse: true}
);

It works perfectly fine if I pass say 150 batchsize but if crosses 4096 bytes it starts failing.

The entire premise behind this thread seems to be totally unrelated to the description provided. The values of the _POSIX_ARG_MAX and ARG_MAX limits affect the total byte count of the strings that can be passed to a member of the exec() family of system calls.

You said in post #8 (quoted above) that mongo is always invoked in your shell script with the command:

   mongo localhost:27045/employee_db -u user_user -p password123 < query/employee_Id_count.js >> output/employee_Id_count.txt

The arguments passed to the system to invoke mongo from this command-line is a constant (unless you change exported environment variables between calls to mongo no matter how large the files localhost:27045/employee_db , query/employee_Id_count.js , and output/employee_Id_count.txt are (or for output/employee_Id_count.txt will become as a result of running this command).

I have no idea what mongo is or what it is trying to do, but I find the argument that the value of _POSIX_ARG_MAX is keeping mongo from doing what you want it to do to indicate a gross misunderstanding of how the _POSIX_ARG_MAX limit is intended to be used.

When you say mongo starts failing with larger lists, what exactly are the diagnostic messages produced? What are the symptoms of the failure?

What is mongo trying to do? Is it trying to send a database query to a system across a network where the system running that database has less than a megabyte of RAM?

Abhayman · September 1, 2017, 1:38am

Hi Dan,

I think you are right . I did lot more analysis in last 24 hrs and it seems the issue is completely with array . It seems due to huge size the array is failing to store it and causes a failure at mongo end.

2017-08-31T18:48:03.286+0000 E QUERY    [thread1] SyntaxError: unterminated string literal @(shell):2:4077
2017-08-31T18:48:03.317+0000 E QUERY    [thread1] SyntaxError: missing ; before statement @(shell):1:5
2017-08-31T18:48:03.348+0000 E QUERY    [thread1] SyntaxError: missing ; before statement @(shell):1:10
2017-08-31T18:48:03.379+0000 E QUERY    [thread1] SyntaxError: missing ; before statement @(shell):1:3
2017-08-31T18:48:03.409+0000 E QUERY    [thread1] SyntaxError: missing ; before statement @(shell):1:3
assert: command failed: {
        "ok" : 0,
        "errmsg" : "bad query: BadValue: $in needs an array",
        "code" : 16810
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:16:14
assert.commandWorked@src/mongo/shell/assert.js:370:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1319:5
@(shell):1:1

I did try to change the ulimit -s to unlimited so that it stores the value but I feel that is also not working . I checked my virtual memory to see if that can help but couldnt figure anything concrete .

              total        used        free      shared  buff/cache   available
Mem:        3878024      347264     2409500       25108     1121260     3161284
Swap:             0           0           0

Regards.