data:
8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i
PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+
8nwR15UzfeZafaf2bGr8akx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjp
lVpOoMLXJ BUNLOES="343433434343mgg3383028383983mgg383827173494" #BGLOFaakx6VwxBX+NafafxJMWX
8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i
PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPAC
Ib5fafafEU24f3EOOjp
i have a huge data file that contains text similar to the above. the text is a very, very long one line and it wraps around several times.
what i want to do is grab only a specific text.
in the above, i want a command that will grab the following text:
BUNLOES="343433434343mgg3383028383983mgg383827173494"
and will turn the "mgg" text into new lines, so that the final output looks like this:
343433434343
3383028383983
383827173494
Im looking for something efficient and portable. so naturally, that would be sed. i found the following:
sed -n -e 's/^.*\(BUNLOES=.*#BGLOFa\)/\1/p' data.txt
this sed command starts attempting to grab from point "BUNLOES up until #BGLOFa .
The command almost seems to work, but not exactly. im trying to do everything with one command instead instead of piping to awk.
RudiC
March 19, 2017, 6:18am
2
How about
sed -n '/^.*BUNLOES="/{s///;s/".*$//; s/mgg/\n/g;p}' file
343433434343
3383028383983
383827173494
1 Like
I get this error:
sed -n '/^.*BUNLOES="/{s///;s/".*$//; s/mgg/\n/g;p}' data.txt
sed: 1: "/^.*BUNLOES="/{s///; ...": extra characters at the end of p command
RudiC
March 19, 2017, 6:38am
4
Try a semicolon after the p
command.
1 Like
Hello SkySmart,
Could you please try following and let me know if this helps you.
1st code:
awk '{match($0,/BUNLOES.*[^"]/);VAL=substr($0,RSTART,RLENGTH);if(VAL){gsub(/BUNLOES="|" #.*/,"",VAL);gsub(/mgg/,RS,VAL);print VAL}}' Input_file
OR
awk '{
match($0,/BUNLOES.*[^"]/);
VAL=substr($0,RSTART,RLENGTH);
if(VAL){
gsub(/BUNLOES="|" #.*/,"",VAL);
num=split(VAL, a,"mgg");
for(i=1;i<=num;i++){
print a
}
}
}
' Input_file
2nd code:
awk '{match($0,/BUNLOES.*[^"]/);VAL=substr($0,RSTART,RLENGTH);if(VAL){gsub(/BUNLOES="|" #.*/,"",VAL);num=split(VAL, a,"mgg");for(i=1;i<=num;i++){print a}}}' Input_file
OR
awk '{
match($0,/BUNLOES.*[^"]/);
VAL=substr($0,RSTART,RLENGTH);
if(VAL){
gsub(/BUNLOES="|" #.*/,"",VAL);
gsub(/mgg/,RS,VAL);
print VAL
}
}
' Input_file
Thanks,
R. Singh
1 Like
is bash portable, under your definition?
regex='BUNLOES=\"([0-9]+)mgg([0-9]+)mgg([0-9]+)\"'
while read line
do
if [[ $line =~ $regex ]]; then
echo "${BASH_REMATCH[1]}"
echo "${BASH_REMATCH[2]}"
echo "${BASH_REMATCH[3]}"
fi
done < filename
2 Likes
awk -F\" 'gsub(/mgg/,RS,$2){print $2}' file
or
awk -F\" '/BUNLOES/ && gsub(/mgg/,RS,$2){print $2}' file
---
Note: On Solaris use /usr/xpg4/bin/awk rather than awk
1 Like
Hello SkySmart,
Taking some inspiration from S and this could be helpful too on same.
awk '{gsub(/.* BUNLOES="|" #.*/,"");gsub(/mgg/,ORS);print}' RS="" Input_file
If you want to check like both the strings BUNLOES
and mgg
are present as like shown Input_file then following additional check we could do and get it done then.
awk '{n=gsub(/.* BUNLOES="|" #.*/,"");n+=gsub(/mgg/,ORS);if(n==4){print}}' RS="" Input_file
Thanks,
R. Singh
1 Like
this seems to work, but it doesnt put the numbers in new lines. i get them all in one line:
time sed -n '/^.*BUNLOES="/{s///;s/".*$//; s/mgg/\n/g;p;}' data.txt
'269118084457'n'3626086549312632'
i've tried repplacing the
\n
with
\\n
but that still didnt work.
RudiC
March 19, 2017, 1:31pm
10
We can't see your terminal, nor do we know the versions of your OS, shell, nor sed
. So - be creative. Look into your sed'
s man
or info
page on how to escape control characters (mine accepts \n
for a <new line> char). Try bash
's (if avaiable) $'string'
mechanism. Try entering CTRL-V CTRL-J
at the command line, but don't forget to escape this with a backslash.
If nothing helps, use one of the other proposals.
1 Like
drl
March 19, 2017, 2:23pm
11
Hi, SkySmart .
When you ask for that, I think it's important for you to tell us what the class of systems on which you need to run this for portability.
For example, some sed
don't seem to run suggested code. So what would be the earliest accepted version of sed
that you have available, etc., etc.?
Best wishes ... cheers, drl
1 Like
Note: the use of \n
in the replacement part of the sed's substitute command is a GNU extension.
The absence of a semicolon between p and } is a GNU extension. The use of semicolon p;}
is a permitted, but not required extension to the POSIX specification, so even that may not be supported by all implementations.
2 Likes
RudiC
March 19, 2017, 2:58pm
13
This (escaped explicit <NL> char) works on FreeBSD (which insists on the ;
following p
):
sed -n '/^.*BUNLOES="/{s///; s/".*$//; s/mgg/\
/g; p; }' file
343433434343
3383028383983
383827173494
1 Like
RudiC's solution above works on my Solaris as well, which I think has non-GNU sed.
$
$ uname -a
SunOS solaris11-3 5.11 11.3 i86pc i386 i86pc
$
$ cat -n data.txt
1 8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i
2 PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+
3 8nwR15UzfeZafaf2bGr8akx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjp
4 lVpOoMLXJ BUNLOES="343433434343mgg3383028383983mgg383827173494" #BGLOFaakx6VwxBX+NafafxJMWX
5 8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i
6 PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPAC
7 Ib5fafafEU24f3EOOjp
$
$ /usr/bin/sed -n '/^.*BUNLOES="/{s///;s/".*$//; s/mgg/\
/g;p;}' data.txt
343433434343
3383028383983
383827173494
$
$ /usr/xpg4/bin/sed -n '/^.*BUNLOES="/{s///;s/".*$//; s/mgg/\
/g;p;}' data.txt
343433434343
3383028383983
383827173494
$
$
I agree with drl that the OP should take stock of the systems and tools (s)he will be working with, in order to determine the portability of the solution.
1 Like
this is the command I ended up using and it works across all necessary platforms:
export LC_ALL=C ; time tr '`' '\n' < data.txt | sed -n '/BUNLOES=/p' | awk -F"BUNLOES=" '{print $2}' | awk '{print $1}' | sed -e 's_"__g' -e 's_ __g' -e '/^$/d' 2>/dev/null | awk '{gsub("mgg","\n");printf"%s",$0}' | sed -e "s/'/ /g" -e 's~ ~~g' | sed '/^$/d'
can someone please help me combine this into one command, if possible?
the content of data.txt is a very very log one line:
8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+ 8nwR15UzfeZafaf2bGr8akx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjp lVpOoMLXJ BUNLOES="'269118084457'mgg'3626086549312632'"mgg1659344516312337mgg1659344516304657mgg5851430858050896mgg2968137013313563 #BGLOFaakx6VwxBX+NafafxJMWX 8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPAC Ib5fafafEU24f3EOOjp8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5iPACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxJMWX8iW5i PACIb5fafafEU24f3EOOjpakx6VwxBX+NafafxfafakJMWX8iW5i
When i run the command of this post, i get this:
269118084457
3626086549312632
1659344516312337
1659344516304657
5851430858050896
2968137013313563
Now i just want to combine all the commands into one.
If you have Perl in all your platforms, then here's one way:
$
$ perl -lne '/BUNLOES=(.*?)\s+/ and do{$x=$1; $x=~s/[\x{27}" ]//g; $x=~s/mgg/\n/g; print $x}' data.txt
269118084457
3626086549312632
1659344516312337
1659344516304657
5851430858050896
2968137013313563
$
$
1 Like
Aia
March 19, 2017, 10:33pm
17
perl -nle 'BEGIN{$,="\n"}/BUNLOES=(.+?)\s/ and print $1=~/(\d+)/g' skysmart.file
269118084457
3626086549312632
1659344516312337
1659344516304657
5851430858050896
2968137013313563
or
perl -nle '/BUNLOES=(.+?)\s/ and print join "\n", $1=~/(\d+)/g' skysmart.file
or
perl -nle 'map{s/\D+/\n/g and print} /BUNLOES\D+(.+?)\s/' skysmart.file
1 Like
perl wont be present on some of these systems i have to run the command on. a lot of them are docker systems with the bare minimum.
Try :
awk 'p==s{gsub(qr,x); gsub(fs,ORS,$1); print $1}{p=$NF}' s=BUNLOES fs=mgg qr="'|\"" RS== file
--
@SkySmart : you just changed the specification in post #15 , by introducing random single quotes and double quotes into the data, plus by mentioning that your data is just one very long line.
That means that all the people who reacted earlier, trying to help you, were not using the right data and therefore were unable to produce adequate code and were basically working for no purpose.
Please do not do that. Have the complete data specification ready when you create the thread..
3 Likes
Aia
March 20, 2017, 12:36am
20
Out of curiosity. Would any of these work?
grep -Eo '[0-9]{4,}' skysmart.file
grep -Po '\d{4,}' skysmart.file
Output:
269118084457
3626086549312632
1659344516312337
1659344516304657
5851430858050896
2968137013313563
1 Like