Large Variable content size caveats?

joker · March 8, 2019, 5:15am

Hi,

I wrote a shell script, which let me manage dns records through an api.

The raw core-command looks like about this:

output="$(curl -X GET https://mgt.myserver.de:8081/api/v1/servers/localhost/zones)"

The output contains a list of all zones with all records and is about 800 Kilobytes JSON-Data.

I ran into a first issue when I used this(incorrect) try to remove leading 000 from the 800K-variable:

# remove 000 from 
output="${output#000*}"

I think the correct term should be ${output#000} . Maybe this is an error when I typed something in without being completely aware, what I'm doing. The resulting regex caused the program did not finish this command. I assume the pattern required a huge amount of computing in that 800K-variable.

My question is:

Are there - in your experience - other general caveats, why one generally should refrain from using such big variable content sizes? As far as I read, there are no relevant size limits within linux regarding variables(for data sizes <100MB). Or would it be generally better, to use files for data at a certain limit?

Environment: Linux, bash

vbe · March 8, 2019, 5:21am

I always thought the size of a variable was limited by the stack size...

joker · March 8, 2019, 5:28am

That may be true.

ulimit -a
 core file size          (blocks, -c) 0
 data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 773193
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 773193
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Corona688 · March 8, 2019, 10:58am

Main caveat, it's almost almost always a dumb idea. Ad-hoc data, "dump-and-fix-later", is the modern UUOC, wasteful and pointless. If you know what you actually want, you can either deal with it in a structured way or avoid storing it entirely.

Another caveat, most shells don't let you keep binary data in variables. Mass dumping unchecked data into variables can occasionally surprise you.

joker · March 10, 2019, 12:36pm

Hi Corona,

thanks for your answer, even if not using very nice words.

What I get out of it: Beware what unknown data can possibly and in worst case may do and especially in heavy interpreted shell environment.

---

The method was not - like you assumed - to get the full data out of the API, have it locally and pull the needed bits out locally. It was in short looking at the API the only way to get what I need. Digging deeper now into the api documentation revealed other ways to do the same more efficiently.

Maximum efficiency was not my goal. I started with a shell script and at about half completion of the task I realized that the task may be to complex for a shell script and it would have been better to use a scripting language. Now I have a 500 lines Bash script which is implemented fairly well structured and working quite well. I will not rewrite it again if there are not serious issues.

My intention with this question is/was to get feedback is if there are any serious problems I was not aware of.

Since all of that data is JSON data which will get fed into jq for extraction, I do not see much potential trouble ahead.

Neo · March 11, 2019, 12:53am

If your data is JSON, why are you processing it with curl and shell scripts where there are tools better suited for processing JSON data?

I process a lot of JSON data on Linux and never use a shell script to process this JSON data. There are many other languages, libs and tools built to process JSON. Why use a tool suboptimal for JSON processing?

Just curious ...

As I said, I process reams of JSON data on Linux and do not use shell scripts to process any JSON objects.

joker · March 11, 2019, 4:35am

Hi Neo,

basically it boils down to jq. Without it I never would have considered doing this task with bash. All JSON-Handling is done with jq. Bash is the code around it.

Even I may have considered differently after having it done, because the task got a little more complex than expected, I think I got the task done very quickly. It's working more than sufficiently well, the code is maintainable and the speed is acceptable. For smaller tasks I consider this still as an excellent choice.

Example: Get an ipaddress from zone data

Assuming this zone data...

JSON-Zonedata from PowerDNS-API . GitHub

...this would be the command to extract the pure ip address of the A-Record "ftp.bla.com":

jq -r '.rrsets[] | select(.name=="ftp.bla.com." and .type=="A") | .records[0].content '  <<<"$json"

#output

1.2.3.5

Neo · March 11, 2019, 5:13am

Thanks for explaining.

Maybe move away from jq and use something more mainstream for processing JSON?

What are you planning to do with the JSON object after you download and process it?

Push it into a mysql db? Push it to Firebase? Save to a flat file on your box? Push it to another server?

joker · March 11, 2019, 5:33am

 What are you planning to do with the JSON object after you download and process it?

JSON is the result data format for the PowerDNS-API.

I'm using the API to query/add/modify/delete DNS-Records - not to store the retrieved data.

The Script is a command line tool, that's used for various purposes(Create/Delete Records for Let's Encrypt Certificate Validation, automatic configuration of MX-Record Changes, Setup of E-Mail-Autoconfiguration,...)

I have no plans to do so, since it's working fine.

Neo · March 11, 2019, 6:16am

Yes, I understanding working with JSON. I do it nearly daily (read, modify, update, write, between client and server)

FWIW,

If I was going to use the same API to modify and update JSON as you are doing, I would write a quick app with either PHP (if I did not need any UI), or I would develop with with Vue.js if I needed a web UI.

PHP processes JSON much better than bash and can easily update as well with very simple tools to pull and push files from the net built in, obviously.

Vue.js with extensions like Axios and Vuex is so rich in features for processing JSON across the net that comparing it to bash would be like comparing the Starship Enterprise to a flat bottom wooden boat (at worse) or at best (and being generous) an antique car.

Anyway, I realize a lot of people like to use these old tech tools from decades past and build an infrastructure around it to make it work; but honestly when working with JSON across the net as you are doing, there are much better tools than command line jq, curl, wget and bash , I promise

Anyway, I think I understand the reason you are using bash, curl and jq .... you are comfortable with those tools and that's cool too . I used to use those tools (excessively) between 15 and 5 years ago, so I understand and might use them again if I was forced to. However, I have noticed a seismic shift in JSON processing, including the use of Firebase and JSON-based NOSQL repos.

PS: At least use Postman to analyze your API calls. Postman is one of the single most productivity enhancing tools out where when working with JSON APIs.