I often use grep to find files having a certain entry like this:
grep -R 'MyClassName'
The good thing is that it returns the files, their contents and marks the found string in red. The bad thing is that I also have huge files where the entire text is written in one big single line. Now grep outputs too much when finding text within those big files. Is there a way to limit the output to for instance 5 words to the left and to the right? Or maybe limit the output to 30 letters to the left and to the right?
grep
itself only has options for context based on lines. An alternative is suggested by this SU post:
A workaround is to enable the option 'only-matching' and then to use
RegExp's power to grep a bit more than your text:grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}" ./filepath
Of course, if you use color highlighting, you can always grep again to
only color the real match:grep -o ".\{0,50\}WHAT_I_M_SEARCHING.\{0,50\}" ./filepath | grep "WHAT_I_M_SEARCHING"
As another alternative, I'd suggest fold
ing the text and then grepping it, for example:
fold -sw 80 input.txt | grep ...
The -s
option will make fold
push words to the next line instead of breaking in between.
Or use some other way to split the input in lines based on the structure of your input. (The SU post, for example, dealt with JSON, so using jq
etc. to pretty-print and grep
... or just using jq
to do the filtering by itself ... would be better than either of the two alternatives given above.)
This GNU awk method might be faster:
gawk -v n=50 -v RS='MyClassName' '
FNR > 1 { printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)}
{p = substr($0, length - n); prt = RT}
' input.txt
- Tell awk to split records on the pattern we're interested in (
-v RS=...
), and the number of characters in context (-v n=...
) - Each record after the first record (
FNR > 1
) is one where awk found a match for the pattern. - So we print
n
trailing characters from the previous line (p
) andn
leading characters from the current line (substr($0, 0, n)
), along with the matched text for the previous line (which isprt
)- we set
p
andprt
after printing, so the value we set is used by the next line RT
is a GNUism, that's why this is GNU awk-specific.
- we set
For recursive search, maybe:
find . -type f -exec gawk -v n=50 -v RS='MyClassName' 'FNR>1{printf "%s: %s\n",FILENAME, p prt substr($0, 0, n)} {p = substr($0, length-n); prt = RT}' {} +
No comments:
Post a Comment