Monday, December 28, 2015

Grep strings in a subgroup of lines in txt file

I have a file that looks like this


AAA_21               PF13304.1  x_00004
AAA_22 PF13401.1 x_00004
SMC_N PF02463.14 x_00004
AAA_29 PF13555.1 x_00004
DUF258 PF03193.11 x_00005
AAA_15 PF13175.1 x_00005
AAA_21 PF13304.1 x_00005
AAA_22 PF13401.1 x_00005
SMC_N PF02463.14 x_00005
AAA_15 PF13175.1 x_00006
AAA_21 PF13304.1 x_00006
AAA_22 PF13401.1 x_00007
SMC_N PF02463.14 x_00007

Now, for each block of lines that have the same string in column 3 (e.g. x_00004), I want to grep only the lines containing specific strings if they are present together in the block.


So, I know that I can use
grep -f
But I cannot find a way for applying the first action. I guess awk will help me here, but I do not really know how.


I would like to have something like:


AAA_21               PF13304.1  x_00004
AAA_22 PF13401.1 x_00004
AAA_21 PF13304.1 x_00005
AAA_22 PF13401.1 x_00005

So basically greping the lines containing PF13304.1 or PF13401.1 only if they are sharing field 3.


I use PF13304.1 and PF13401.1 as example, because sometimes I look for the presence of 3 strings in the block.
One problem is that the string I am looking for are not always consecutive in the file I want to scan.


All the strings I want to grep are reported in a txt file as well. I can organize them as I want to match the grep command.


Instead the line containing


AAA_21               PF13304.1  x_00006
AAA_22 PF13401.1 x_00007

Should not be included because the strings I want to grep do not share field 3, meaning they are not both present in the subgroups x_00006 or x_00007


So, from the logical point of view I want to



  1. open the file

  2. divide the lines in groups according with field 3, create group that have the same string in field 3

  3. in this subgroups grep the strings I am looking for only if they are all present in each block

No comments:

Post a Comment

11.10 - Can't boot from USB after installing Ubuntu

I bought a Samsung series 5 notebook and a very strange thing happened: I installed Ubuntu 11.10 from a usb pen drive but when I restarted (...