I have a file that looks like this
AAA_21 PF13304.1 x_00004
AAA_22 PF13401.1 x_00004
SMC_N PF02463.14 x_00004
AAA_29 PF13555.1 x_00004
DUF258 PF03193.11 x_00005
AAA_15 PF13175.1 x_00005
AAA_21 PF13304.1 x_00005
AAA_22 PF13401.1 x_00005
SMC_N PF02463.14 x_00005
AAA_15 PF13175.1 x_00006
AAA_21 PF13304.1 x_00006
AAA_22 PF13401.1 x_00007
SMC_N PF02463.14 x_00007
Now, for each block of lines that have the same string in column 3 (e.g. x_00004), I want to grep
only the lines containing specific strings if they are present together in the block.
So, I know that I can usegrep -f
But I cannot find a way for applying the first action. I guess awk
will help me here, but I do not really know how.
I would like to have something like:
AAA_21 PF13304.1 x_00004
AAA_22 PF13401.1 x_00004
AAA_21 PF13304.1 x_00005
AAA_22 PF13401.1 x_00005
So basically greping the lines containing PF13304.1
or PF13401.1
only if they are sharing field 3.
I use PF13304.1
and PF13401.1
as example, because sometimes I look for the presence of 3 strings in the block.
One problem is that the string I am looking for are not always consecutive in the file I want to scan.
All the strings I want to grep
are reported in a txt file as well. I can organize them as I want to match the grep
command.
Instead the line containing
AAA_21 PF13304.1 x_00006
AAA_22 PF13401.1 x_00007
Should not be included because the strings I want to grep
do not share field 3, meaning they are not both present in the subgroups x_00006
or x_00007
So, from the logical point of view I want to
- open the file
- divide the lines in groups according with field 3, create group that have the same string in field 3
- in this subgroups
grep
the strings I am looking for only if they are all present in each block
No comments:
Post a Comment