Sunday, September 24, 2017

command line - How to prevent grep from printing the same string multiple times?



If I grep a file containing the following:




These are words
These are words
These are words
These are words


...for the word These, it will print the string These are words four times.



How can I prevent grep from printing recurring strings more than once? Otherwise, how can I manipulate the output of grep to remove duplicate lines?




The Unix philosophy is to have tools that do one thing and do them well. In this case, grep is the tool that selects text from a file. To find out if there are duplicates, one sorts the text. To remove the duplicates, one uses the -u option to sort. Thus:



grep These filename | sort -u


sort has many options: see man sort. If you want to count duplicates or have a more complicated scheme for determining what is or is not a duplicate, then pipe the sort output to uniq: grep These filename | sort | uniq and see manuniq` for options.


No comments:

Post a Comment

11.10 - Can't boot from USB after installing Ubuntu

I bought a Samsung series 5 notebook and a very strange thing happened: I installed Ubuntu 11.10 from a usb pen drive but when I restarted (...