Editing a text file using shell

Rayj00 · August 13, 2023, 4:07pm

I have a text file that is filled with many lines of data that looks like this:

08/10/23 25,26,28,30,31 07/12/23 2,14,23,27,31

I need to edit the file or create a new file to look like this:

25,26,28,30,31
2,14,23,27,31

Ideas?

munkeHoller · August 13, 2023, 5:34pm

@Rayj00 , Welcome. The forum is a collaboration, you show what you're attempting, drawbacks,problems etc and the team can come back with suggestions/fixes/solutions etc ( which may include partial/complete solutions) .

please provide your code attempts,, this will avoid/minimize ambiguity on our side and provide clarity as to use request.

Are you confident that a single line of your input file comprehensively demonstrates the entire file structure ?

Are there specific reason(s) why the 'shell' must be used, if so, please elaborate and specify which shell .

Please take the time to use MARKDOWN tags when posting code/data.

Thks

MadeInGermany · August 13, 2023, 6:21pm

Perhaps you can formulate a rule?
Like "delete all dd/mm/yy dates".

spammed · August 13, 2023, 9:42pm

You seem to have a pattern of month/day/year (only 2 digits) white space int1,int2,int3,int4,int5 white space repeat one time (end of line?). This pattern can be matched and the desired output created with a sed substitution command.
I recommend redirecting output to a new file rather than in-place substitution because a mistake (or unclean data) could fatally destroy your input file.
If the data file is clean and the fields you want are always in column 2 and 4, awk would be a one-liner.

MadeInGermany · August 16, 2023, 6:27am

The following sed deletes the dd/dd/dd strings and their trailing space:

sed 's#\<[[:digit:]]\{2\}/[[:digit:]]\{2\}/[[:digit:]]\{2\}\>  *##g' inputfile

And the remaining separating space can be substituted by a newline:

#!/bin/sh
sed 's#\<[[:digit:]]\{2\}/[[:digit:]]\{2\}/[[:digit:]]\{2\}\>  *##g; s#  *#\
#g' inputfile

(There must really be a new line in order to print a newline.)
The same with pure bash builtins:

#!/bin/bash
ddpat="[[:digit:]][[:digit:]]"; dtpat="$ddpat/$ddpat/$ddpat"
while read -ra words
do
  for w in "${words[@]}"
  do
    if [[ $w != $dtpat ]]
    then
      echo "$w"
    fi
  done
done < inputfile

GLHarrison · August 17, 2023, 6:39am

cut -c10-23,33-46 file-in | tr ' ' '\n' > file-out

Paul_Pedant · August 17, 2023, 9:14am

There does not seem to be a need to use patterns to identify the fields. Just use awk to print the even-numbered fields.

awk '{ print $2; print $4; }'

Or for the general case where the fields in a row come in pairs, but the number of pairs varies:

awk '{ for (f = 2; f <= NF; f += 2) print $(f); }'

If you really need to specifically eliminate the dates, and the columns are not exactly paired, use the simplest pattern that does the job.

If there are always at least two number values (so there will be a comma).

awk '{ for (f = 1; f <= NF; ++f) if (index ($(f), ",")) print $(f); }'

Or if dates will always have a /:

awk '{ for (f = 1; f <= NF; ++f) if (! index ($(f), "/")) print $(f); }'

HamzaHaider · August 17, 2023, 11:26am

You can try using regex to replace dates with new line on your data. following command can be used:
sed -i 's/([0-9]+(/[0-9]+)+)/\n/g' <filename>
make sure to edit filename in the above command.

Jeo · August 17, 2023, 3:49pm

Ok, this might be TOO simple, but you could try something like ths:

grep -Eo '[0-9],[0-9,]*' file.txt