@rDamascena , Welcome. The forum is a collaboration, you post your challenge AND your attempts, the team respond with fixes, alternatives etc ( some of which might be completel solutions).
You specify awk, why ?
It can be any language, I didn’t think that would be an issue. In fact, the purpose is to reduce the sample space to generate a draw among the remaining dozens. Almost none, to be quite honest.
@rDamascena , please, answer all questions , do not be vague.
PS: I have written a working example, but as you appear to be guarded in responding to basic questions it remains hidden until you are open and frank wrt your request.
Otherwise, try asking one of the LLM's to help you out you may make some progress.
The shuf command is often used to produce a selection of random numbers. You could use these to control which columns and which rows to delete (or which to keep). You can look that up with the command man -s 1 shuf.
The col command can be used to select columns by number.
Selecting lines would be quite difficult in some commands like sed. That is a line-based editor, so each line number would need a separate command (but you could generate those commands quite easily from a random sequence).
But the random part is most of the fun!! OK, I sort the random choices for clarity (and so as not to attempt to rearrange the order of lines or columns). In fact, how you come up with the lists of numbers (random or user-declared) has no real impact on doing the edits.
I can generate these variables through simple pipelines, for use by the obvious commands. The user can do the same by hand. One catch is that the Cols is the same for every row, but the Rows list can be as long as half of the file (if you are prepared to deal with both "retain" and "delete" lists).
Sorry -- the declare is merely debug from the script I was withholding until we saw some effort from the OP. I prefer declare over printf because it shows names, quotes, arrays etc, which user debug tends to skimp on.
As the script is now out of the bag, this was mine (the positive "keep-these" version). I am still clinging to the "random" interpretation, simply because shuf is quite cute for generating bulk test data. The OP did mention "to reduce the sample space" which seems like the kind of thing you might prefer to randomise.
I don't use sed often -- I forgot it will take a comma-separated list of row numbers to avoid those repeated p commands.
The negative "remove-these" version is very similar -- 3 changes. This version would be more compact, if you are removing only a few lines and columns:
.. Change the 'p' print to 'd' delete in the Row-generating sed command.
My bad on the list of line numbers -- I missed your embedded global substitution. For some reason, I expected sed to behave for lines like cut does for fields: 2,4,7 for a list of single lines, 3-6 for a range of lines.
Kudos for the sed '$!s/$/,/'. I suspect I am too old to learn all of sed's syntax.
You don't need to shuffle the lines and then sort them. Also, there is a Bash sequence expansion, and if you give that leading zeros it will format with leading zeros. So making the block of data could be done like:
echo {01..60} | xargs -n 10
I'm not sure what you want to achieve. I assumed you wanted to make a random choice of lines and columns, but your question can also be interpreted as wanting to choose those yourself. I also assumed the numbered table was just an example -- that the real data (the sample space) would be something like personal names, proteins, zipcodes, cities.
@MadeInGermany posted the complete thing -- the one that starts "Implementing the idea from the previous post". It starts by having you specify which columns and rows you want to keep. Then it produces the table with a printf and the expansion of the numbers, cuts out some columns, then cuts out some lines. The three commands are connected in a chain (called a pipeline) and output the cut-down table.
@Paul_Pedant , Given the essence of this forum is collaboration , I had asked the requester to share details/attempts and that posting solution(s) would not be forthcoming (at least from me) until they showed at least some attempt ... alas that request was ignored.
The choice of rows and columns to be deleted will happen completely randomly.
Only the amount of rows and columns that will be deleted is defined by me, but never which ones. They will be deleted completely at random, without my interference.
Only the quantity is pre-defined, but never which ones: between 1 and 6, for rows and between 1 and 10, for columns.
For example, I want 2 rows and 4 columns to be deleted: 2 between any of the 6 rows and 4 between any of the 10 columns.
A generalized version; the constants are configured at the beginning.
# Number of columns and rows
nCOLS=10 nROWS=6
# Number of columns and rows to delete
nxCOLS=3 nxROWS=2
xrows="$( shuf -i 1-"$nROWS" -n "$nxROWS" | sort -n | sed 'H; 1h; $!d; x; s/\n/,/g' )"
echo "Delete rows: $xrows"
xcols="$( shuf -i 1-"$nCOLS" -n "$nxCOLS" | sort -n | sed 'H; 1h; $!d; x; s/\n/,/g' )"
echo "Delete columns: $xcols"
# Using the bash/ksh/zsh builtin // modifier
sedcode="${xrows//,/d;}d"
# Generate the input table
seq --format="%02.f" 1 $(( nCOLS * nROWS )) | xargs -n "$nCOLS" |
# Delete rows and columns from the input
sed "$sedcode" | cut --complement -d " " -f "$xcols"
Explanation of the sed code: H add to the hold space with a newline separator 1h in input line 1 overwrite the hold space $!d unless it's the last line, delete and jump to next input cycle (nothing is printed)
Only run in the last input line: x get the hold space s/\n/,/g substitute newlines with commas
declare -p Rows just prints the variable Rows, but it does that in a format that you would use if you were creating the variable (quotes, full syntax for arrays etc).
$ Rows="3;4;5;6;9"
$ declare -p Rows
declare -- Rows="3;4;5;6;9" # <= This is a diagnostic output from the -p.
$ declare -- Rows="Whatever" # <= This is a command that is overwriting Rows
$ declare -p Rows
declare -- Rows="Whatever"
You either want the top 6 lines of code (which create the random values you asked for), or the two specific initialisations just before the table is output (which will always delete the same rows and columns). Not both.
As previously mentioned, using shuf and sort in the same pipeline, along with three sed expressions, is not needed. My best call is echo {01..60} | xargs -n 10.