I have a text file that is filled with many lines of data that looks like this:
08/10/23 25,26,28,30,31 07/12/23 2,14,23,27,31
I need to edit the file or create a new file to look like this:
25,26,28,30,31
2,14,23,27,31
Ideas?
I have a text file that is filled with many lines of data that looks like this:
08/10/23 25,26,28,30,31 07/12/23 2,14,23,27,31
I need to edit the file or create a new file to look like this:
25,26,28,30,31
2,14,23,27,31
Ideas?
@Rayj00 , Welcome. The forum is a collaboration, you show what you're attempting, drawbacks,problems etc and the team can come back with suggestions/fixes/solutions etc ( which may include partial/complete solutions) .
please provide your code attempts,, this will avoid/minimize ambiguity on our side and provide clarity as to use request.
Are you confident that a single line of your input file comprehensively demonstrates the entire file structure ?
Are there specific reason(s) why the 'shell' must be used, if so, please elaborate and specify which shell .
Please take the time to use MARKDOWN tags when posting code/data.
Thks
Perhaps you can formulate a rule?
Like "delete all dd/mm/yy dates".
You seem to have a pattern of month/day/year (only 2 digits) white space int1,int2,int3,int4,int5 white space repeat one time (end of line?). This pattern can be matched and the desired output created with a sed substitution command.
I recommend redirecting output to a new file rather than in-place substitution because a mistake (or unclean data) could fatally destroy your input file.
If the data file is clean and the fields you want are always in column 2 and 4, awk would be a one-liner.
The following sed deletes the dd/dd/dd strings and their trailing space:
sed 's#\<[[:digit:]]\{2\}/[[:digit:]]\{2\}/[[:digit:]]\{2\}\> *##g' inputfile
And the remaining separating space can be substituted by a newline:
#!/bin/sh
sed 's#\<[[:digit:]]\{2\}/[[:digit:]]\{2\}/[[:digit:]]\{2\}\> *##g; s# *#\
#g' inputfile
(There must really be a new line in order to print a newline.)
The same with pure bash builtins:
#!/bin/bash
ddpat="[[:digit:]][[:digit:]]"; dtpat="$ddpat/$ddpat/$ddpat"
while read -ra words
do
for w in "${words[@]}"
do
if [[ $w != $dtpat ]]
then
echo "$w"
fi
done
done < inputfile
cut -c10-23,33-46 file-in | tr ' ' '\n' > file-out
There does not seem to be a need to use patterns to identify the fields. Just use awk to print the even-numbered fields.
awk '{ print $2; print $4; }'
Or for the general case where the fields in a row come in pairs, but the number of pairs varies:
awk '{ for (f = 2; f <= NF; f += 2) print $(f); }'
If you really need to specifically eliminate the dates, and the columns are not exactly paired, use the simplest pattern that does the job.
If there are always at least two number values (so there will be a comma).
awk '{ for (f = 1; f <= NF; ++f) if (index ($(f), ",")) print $(f); }'
Or if dates will always have a /:
awk '{ for (f = 1; f <= NF; ++f) if (! index ($(f), "/")) print $(f); }'
You can try using regex to replace dates with new line on your data. following command can be used:
sed -i 's/([0-9]+(/[0-9]+)+)/\n/g' <filename>
make sure to edit filename in the above command.
Ok, this might be TOO simple, but you could try something like ths:
grep -Eo '[0-9],[0-9,]*' file.txt