Issue 229: Add ability to specify a list of peptide sequences to BlibBuild

Assigned To:Guest
Opened:2013-04-08 by Brendan MacLean
Changed:2013-05-09 by Brendan MacLean
Resolved:2013-05-09 by Brendan MacLean
Closed:2013-05-09 by Brendan MacLean
2013-04-08 Brendan MacLean
Title»Add ability to specify a list of peptide sequences to BlibBuild
Assigned To»
In a meeting with Jake Jaffe, here in Cambridge, he suggested that it would be great to be able to build more targeted libraries from within Skyline. He has potentially hundreds of msms.txt files from MaxQuant from which he would like to accumulate library data for a know set of target peptides, without necessarily building the entire library for all peptides found in these searches.

To achieve this, we would need to adapt BlibBuild to support passing in a list of target peptide sequences (either bare sequences or in modified format should both be supported), and then BlibBuild would simply ignore all spectra matching other peptides.

You can sort of get this in Skyline today, if you first build the complete library and then add it to a Skyline document that contains just the peptides of interest, and use that to perform a File > Share - Minimal operation. But we'd like to be able to get their directly in the library build step.

2013-04-08 Kaipo
Makes sense, though one thing that I'm unsure about - how should the peptide sequences should be passed in (e.g. entered directly on the command line, a file with a single peptide per line, or maybe some other format that could be extended to filter library building using criteria other than sequences)?

2013-04-08 Brendan MacLean
Ideally (for Skyline) it would be possible to pass the list over stdin, as we pass the list of files today, though you would have to come up with a convention for separating the lists in case both are used (as they would be for Skyline), perhaps a blank line?

You could also allow this list to be specified in a file, but Skyline would never use that.

Thanks, Kaipo.

2013-04-09 Kaipo
I've implemented it so that the sequences are passed in with switches (can change if there is a better way),

-i <sequences> to ignore peptides except those with the specified sequences
-I <sequences> to ignore peptides except those with the specified sequences including modifications

So a sequence could be passed in using:
BlibBuild -i ACPIDQAIGIIVAIFHK msms.txt out.blib

Or multiple sequences could be passed in using one of these:

One thing that might have to change is the way that modified sequences are targeted, depending on the precision. At the moment I consider a target sequence to match a modified sequence if the delta mass matches to one digit after the decimal. So

A[+42.0]C[+56.99]GIVASNINIKPGEC[+57.0]IR will match
A[+42]C[+57]GIVASNINIKPGEC[+57]IR will not match

I'm not sure if this is the right way to handle this?

2013-05-09 Brendan MacLean
resolve as Fixed
Assigned»Brendan MacLean
Kaipo has committed a fix that uses standard in for the sequences.

2013-05-09 Brendan MacLean
Assigned ToBrendan MacLean»Guest