Blog: Grep strings in a subgroup of lines in txt file

Monday, December 28, 2015

Grep strings in a subgroup of lines in txt file

I have a file that looks like this

AAA_21               PF13304.1  x_00004
AAA_22               PF13401.1  x_00004
SMC_N                PF02463.14 x_00004
AAA_29               PF13555.1  x_00004
DUF258               PF03193.11 x_00005
AAA_15               PF13175.1  x_00005
AAA_21               PF13304.1  x_00005
AAA_22               PF13401.1  x_00005
SMC_N                PF02463.14 x_00005
AAA_15               PF13175.1  x_00006
AAA_21               PF13304.1  x_00006
AAA_22               PF13401.1  x_00007
SMC_N                PF02463.14 x_00007

Now, for each block of lines that have the same string in column 3 (e.g. x_00004), I want to grep only the lines containing specific strings if they are present together in the block.

So, I know that I can use
grep -f
But I cannot find a way for applying the first action. I guess awk will help me here, but I do not really know how.

I would like to have something like:

AAA_21               PF13304.1  x_00004
AAA_22               PF13401.1  x_00004
AAA_21               PF13304.1  x_00005
AAA_22               PF13401.1  x_00005

So basically greping the lines containing PF13304.1 or PF13401.1 only if they are sharing field 3.

I use PF13304.1 and PF13401.1 as example, because sometimes I look for the presence of 3 strings in the block.
One problem is that the string I am looking for are not always consecutive in the file I want to scan.

All the strings I want to grep are reported in a txt file as well. I can organize them as I want to match the grep command.

Instead the line containing

AAA_21               PF13304.1  x_00006
AAA_22               PF13401.1  x_00007

Should not be included because the strings I want to grep do not share field 3, meaning they are not both present in the subgroups x_00006 or x_00007

So, from the logical point of view I want to

open the file

divide the lines in groups according with field 3, create group that have the same string in field 3

in this subgroups grep the strings I am looking for only if they are all present in each block

Blog

Monday, December 28, 2015

Grep strings in a subgroup of lines in txt file

No comments:

Post a Comment

11.10 - Can't boot from USB after installing Ubuntu

Monday, December 28, 2015

Grep strings in a subgroup of lines in txt file

No comments:

Post a Comment

11.10 - Can&#39;t boot from USB after installing Ubuntu

11.10 - Can't boot from USB after installing Ubuntu