I want a command-line tool to search documents (including doc, docx, odt) for a string, and to limit the results based on a filename pattern, eg "search piano letters" to search for the text "piano" in any file with "letters" in its name.
The tracker search command is good, but returns hits on all matching indexed files, so I can't see the wood for the trees. I need something more focused, and I don't want to have to reconfigure tracker by modifying some obscure setting file every time I want to search. If I was searching for ASCII text it would simple by using "grep -r pattern directory" but this doesn't work on modern word documents.
There are several questions on this subject (and many marked as duplicate) but none with a satisfactory answer (at least for me).
So I wrote a script called "search" to run tracker and filter the results based on filenames that match a given pattern. Using "tracker search piano -l 1000" I get 136 hits which includes too much noise. Using "search piano letters" I get 4 hits showing the filenames (as clickable links) followed by the relevant text line, which is great.
#! /bin/bash
#
# Use "tracker" to search files for content matching a pattern.
# (tracker indexes files by content, including text in MS Word documents.)
# Optionally filter on file pathnames matching another pattern.
#
# Synopsis:
# search content-pattern [path-pattern]
Usage="Usage: ${0##*/} content-pattern [path-pattern]"
case $# in
(1)
IfPathPattern=false
;;
(2)
IfPathPattern=true
;;
(*)
echo "$Usage" >&2
exit 2
;;
esac
tracker search -l 1000 "$1" |
if $IfPathPattern
then
awk -v pattern="$2" '
BEGIN {pattern=tolower(pattern)}
{text=tolower($0)}
lines>0 {print; lines--}
text~pattern {print; lines=2}'
else
cat
fi
No comments:
Post a Comment