Triming the data in a file.

Hi, I have a big csv file with below data.

file:

La Cage Aux Folles (Widescreen)
Famous Mystics and Psychics
A Passion for Planning Financials  Operations  Marketing  Management  and Ethics
Precious Moments Holy Bible New King James Version Precious Angels Edition  Blue
Practical Recording 2 Pro Tools
Generals at Rest The Grave Sites of the 425 Official Confederate Generals
Getting the Most Out of Teaching With Newspapers Learning-Rich Lessons
Strategies  and Activities That Use the Power of Newspapers to Teach Current
Events and Build Skills in Reading  Writing  Math
Box Of The Blues (4CD)
The McDonaldization Thesis Explorations and Extensions
Film and Literature An Introduction and Reader
The Ecology of Eden
Jaroslav Rossler
The Complete Idiots Guide to Italian Level 1
Fresh Eggs
Energizer Lithium-Ion Digital Camera Battery ER-D210
Leaders That Last How Convenant Friendships Can Help Pastors Thrive
Krishnas Cosmos The Creativity of an Artist  Sculptor  Teacher
Numerical Simulation in Tunneling
Creating Sanctuary A New Approach to Gardening in the Washington Metropolitan
Area
Thelonious Monk Orchestra At Town Hall (Remaster)  The

I need to trim the data file like it should contain three or less i.e 1 or 2 or 3 words in a line. The three words can be the fisrt three words of the line or the last three words of the line. it should be random way some lines should consist first three words and some should consist last three words.
example as shown as below.

trimmed file having three words from every line (randomly first three words or last three words):

La Cage Aux 
Mystics and Psychics
Management  and Ethics
Precious Moments Holy
Recording 2 Pro Tools
Official Confederate Generals
Learning-Rich Lessons
to Teach Current
Reading  Writing  Math
Box Of The Blues
The McDonaldization Thesis
Film and Literature
Ecology of Eden
Jaroslav Rossler
The Complete Idiots
Fresh Eggs
Energizer Lithium-Ion
Help Pastors Thrive
Artist  Sculptor  Teacher
Numerical Simulation
the Washington Metropolitan
Area
Thelonious Monk Orchestra

can someone help me on this.

Any attempts / ideas / thoughts from your side?

When you address RudiC, and your approach to this, you should also include a reason for the request.
Your request does not seem to have any logical reason, and thus gives the impression that it is some kind of homework/classwork. There is a special process for assisting with school-related requests.

1 Like

Hello joeyg ,
this is not a homework/classwork.this is for testing purpose for our application and to search the key words. so i need in that format.

Hello Rudic,
i had thought like a script reading the every line count if is greater than 9 then i will print last three words with "awk -F, '{print $7,$8,$9}' " line and if it its less than 7 then i will print "awk -F, '{print $5,$6,$7}' " and if it is less than or equal to three then i will take first three like
"awk -F, '{print $5,$6,$7}' OFS=' ' "
so i am working on this long script
i thought of getting your help to achieve it in any other short and simple way.

my script:

IFS=$'\n'; for line in $(cat "sdg-catalog-00000001"); do

wordcount=$(echo $line|wc -w);
echo word count=$wordcount
if [ $wordcount -le "5" ]
then
   echo "word count is  less than 5 so taking first 3 words"
   echo $line|awk -F " " '{print $1,$2,$3}' >> new.csv
elif [ $wordcount -gt "5" ]  && [ $wordcount -le "8" ]
then
   echo "word count is greater than 5  and less than 8 so taking 4,5,6 words"
   echo $line|awk -F " " '{print $4,$5,$6}' >> new.csv
elif [ $wordcount -ge "8" ]
then
   echo "word count is greater than 8 so taking 6,7,8 words"
   echo $line|awk -F " " '{print $6,$7,$8}' >> new.csv
else
fi
done

The conditions you apply in post #4 severely deviate from your statement "t should be random way"in post #1. Which is the valid one, now?

For the random request, try

awk '
NF < 4  ||
int(10 * rand())%2      {print $1, $2, $3
                         next
                        }
                        {print $(NF-2), $(NF-1), $NF
                        }
' file

and reprt back.

Even it can be the middle three words too.But the three words should be in consecutive.
can u help me with that.
the above script u provided is working for only for first three words of the line or the last three words of the line.

Moving targets don't really help.
Targets defined imprecisely don't really help.

Ok fine let me go with this.
Thanks for your help Rudic.

Try

awk '
        {X = 3 + int((NF-2) * rand())
         print $(X-2), $(X-1), $X
        }
' file
2 Likes

let me and try and come back

--- Post updated at 10:38 AM ---

superb ..! this is working as expected....a small three lines code doing this much operation is so nice.
Thanks Rudic