Questions about iRT calculators

support
Questions about iRT calculators gabe  2017-09-11 17:09
 
Hi Brendan,

I'm trying to understand iRT predictions: we routinely generate DDA data to populate reference spectral libraries for use in PRM experiments. We use 12 ubiquitous, endogenous "standard" peptides that span the retention-time range that we're interested in, rather than commercial/exogenous standard peptides. I create spectral libraries by exporting filtered SEQUEST results from these DDA runs and use Skyline to Build .blib libraries. Skyline gives the option of specifying "iRT standard peptides" in the 'Build Library' tool, but only allows you to choose from a pre-defined list of options (Biognosys-10, Biognosys-11, etc). This brings me to my first question:

Are iRT values stored in .blib files? If so, how can these be calculated if the pre-defined lists of standard peptides are not used? If the iRT values aren't stored in .blib files then what is the point of selecting a group of iRT standard peptides during the blib build process? How is that information used?

Next question is when I combine reference spectral libraries (either from primary data or by combining .blib files) how are the iRT data stored? If I generate a spectral library from 30-minute runs, and combine it (without keeping redundant spectra) with a spectral library from 180-minute runs, and add the *combined* .blib file to an iRT calculator - how will it find the correct iRT values? Is it necessary to keep the redundant spectra if you want to be able to do this sort of thing?

Thanks

Gabe
 
 
Brendan MacLean responded:  2017-09-11 17:25
Hi Gabe,
On your first question:

"Are iRT values stored in .blib files? If so, how can these be calculated if the pre-defined lists of standard peptides are not used? If the iRT values aren't stored in .blib files then what is the point of selecting a group of iRT standard peptides during the blib build process? How is that information used?"

We made it possible to store iRTs in .blib files to make it easier for people building spectral libraries from DDA who also wanted iRT. Though, originally, we kept the two separate, with spectra going into .blib files and iRT values going into .irtdb files, and requiring extra steps. The latter is still possible to achieve. You just build your .blib without specifying iRT standards. Then you use Edit iRT Calculator as described in the iRT tutorial to define a iRT calculator in terms of your custom iRT landmark peptides (standards), and finally use Add > Spectral Library... to add all the other peptides in your spectral library relative to the landmark peptides (as described on Page 27 of the iRT tutorial).

I tried to make that pretty easy, but it is obviously less easy than choosing the standards from a list and having the iRTs end up directly in the .blib file, which is what we now allow for known standards, and hope to one day allow even for custom standards like you describe. Not there yet, though.

On your second question:

"Next question is when I combine reference spectral libraries (either from primary data or by combining .blib files) how are the iRT data stored? If I generate a spectral library from 30-minute runs, and combine it (without keeping redundant spectra) with a spectral library from 180-minute runs, and add the *combined* .blib file to an iRT calculator - how will it find the correct iRT values? Is it necessary to keep the redundant spectra if you want to be able to do this sort of thing?"

Skyline always stores all of the actual measured retention times in the non-redundant .blib file. So, despite the fact that you lose all of the redundant spectra a non-redundant blib file still contains all of the ID times. That means for MS1 data that we can show you where the IDs happened and peak pick appropriately, even if we can't show you all of the MS/MS spectra. And for iRT, it means that we can always normalize all measurements to the right iRT standard times (i.e. the ones measured in the same run) even when there are no redundant spectra.

Hope this helps to clarify. Thanks for posting your questions to the Skyline support board.

--Brendan
 
gabe responded:  2017-09-12 05:02
Thanks for the lightening-quick response, Brendan!

I've dug around in the .blib files a bit trying to understand these features. Now that you've answered my question, I see that there's a 'retentionTime' field in the 'refSpectra' table, and there's also a whole table called 'RetentionTimes' that includes all the redundant RTs and references the original RAW files from whence they came (in SpectrumSourceFiles).

So then presumably when you add a .blib file to an iRT calculator it can calculate the iRT for each peptide relative to the standard peptide retention times in the appropriate file. Thus, I can return results like this from the .blib file:

id peptideSeq precursorCharge copies retentionTime RedundantRefSpectraID retentionTime fileName
---------- ------------ --------------- ---------- ------------- --------------------- ------------- --------------------------------------------------
27645 LLDLVQQSCNYK 2 125 91.52333 299078 65.53506 C:\Xcalibur\Raw files\Aug2016\A549_02.raw
27645 LLDLVQQSCNYK 2 125 91.52333 434236 75.877056 C:\Xcalibur\Raw files\Aug2016\PBMC_02.raw
27645 LLDLVQQSCNYK 2 125 91.52333 434297 76.304303 C:\Xcalibur\Raw files\Aug2016\PBMC_02.raw
27645 LLDLVQQSCNYK 2 125 91.52333 434298 76.311382 C:\Xcalibur\Raw files\Aug2016\PBMC_02.raw
27645 LLDLVQQSCNYK 2 125 91.52333 249877 68.707751 C:\Xcalibur\Raw files\Aug2016\231_03.raw
27645 LLDLVQQSCNYK 2 125 91.52333 249963 69.062296 C:\Xcalibur\Raw files\Aug2016\231_03.raw

which is great.

Do I assume correctly that when multiple spectral-counts are detected for a peptide in a given file that the redundant retention times are *averaged* to calculate the iRT values? And does this mean that it's better to import results into Skyline that include ALL the DDA data so that *average* RTs can be used for iRT calculations? To date, I've been importing .pep.xml result files that include only the best match for each peptide, but it seems like it's probably preferable to include ALL spec counts (not just the best ones) so that Skyline can calculate more-accurate iRT values.

Thanks again for your detailed and fast response!
 
Brendan MacLean responded:  2017-09-12 17:49
Hi Gabe,
Skyline does use "averaged" times. Actually, it now uses the median time. As I write this, I wish I had a version of median that never took the middle of two values, as our median implementation does when there are an even number of values. The problem is a straight mean or even the middle of 2 values where MS/MS spectra were measured can be far from the elution apex, even if both MS/MS were in the middle of an apex.

Suffice it to say that I have thought about it a lot, and I believe Skyline does as well as any other tool building libraries from identified MS/MS spectrum times, which does not extract MS1 chromatograms and find the true peak apexes in them.

I think you are better off to having more than just the best ID for each peptide. You would need to have that for at least your iRT standard peptides, I think. Or Skyline wouldn't be able to calibrate each file. To be certain I understand what you are talking about, is that the best ID in each file or across multiple files?

Based on some intense work on this problem, I made Skyline first align all of the files in the .blib by as many peptides as they share with each other, which is usually lots more than just the iRT peptides, but at worst just the iRT peptides. Then Skyline takes the medians of all of those aligned times and calibrates them against the iRT medians.

This produced far less variance than using iRT peptides only to align them all to iRT space and then picking the median in iRT space.

Well, mileage may vary. Hope this gives you some insight. If you learn more yourself, we are always interested to hear your informed opinions.

Thanks for posting to the Skyline support board and being willing to dig into the details.

--Brendan
 
gabe responded:  2017-09-13 03:47
Thanks Brendan,
I think I understand and it's clear that the "best peptide" strategy is sub-optimal. I'll regenerate my blib files using all the redundant high-quality SEQUEST results as inputs and let Skyline filter out the redundant ones (and, presumably, retain all the redundant RTs for use in iRT calculations).

One final question on this topic. We would like to build a larger and larger reference spectral libraries over time. We do DDA runs in new proteomes to collect reference spectra every few weeks/months and it would be nice if we didn't have to worry about which run/file a particular reference spectrum came from.

My original plan ("strategy #1") was to generate a *single* .blib file that contained non-redundant ref spectra from ALL of these DDA runs -- call this MasterRSL.blib. When we did more DDA runs, I'd generate a new .blib file (newRunsA.blib) and then combine MasterRSL.blib and newRunsA.blib to create a new master file: MasterRSL_v2.blib. My plan was to combine these using the 'Build' tool in Skyline and to NOT keep the redundant spectra.

An alternative strategy (strategy #2) would be to store newRunsA.blib, newRunsB.blib, newRunsC.blib, etc... in a folder and import each of them to Skyline as-needed. This strategy is more cumbersome because you have to know/remember which .blib file a particular reference spectrum is in.

Do you have an opinion on which of these is superior? Strategy #2 requires more effort to keep track of file-names and stuff, so I'd prefer option #1. But I worry that I don't totally understand how Skyline is combining all the RT data in each .blib file so this might have unpredictable results. Specifically, your comment about how Skyline first "aligns all the files in the .blib by as many peptides as they share with eachother... then calibrates them against the iRT medians"

What do you recommend? Is it alright to generate a "master" .blib file and add to it periodically? Or should groups of runs be stored in separate .blib files and loaded individually using the 'Library' in Skyline? Also, does it matter if I combine via sequentially adding new .blibs to the Master.blib file or generating a new master file from all the original .blibs when I want a new master? Will the results be the same?


Thanks!
 
Brendan MacLean responded:  2017-09-13 09:07
I recommend checking carefully as you go, because I can't promise your strategy has been well tested. It is possible you may find a bug, but I would expect #1 to work fine for iRT purposes. Skyline shouldn't be losing any retention times. Remember, it keeps all of the retention time information you give it. So, while not including all of the measured spectra in your initial build will deprive you .blib file of that extra RT information, BlibBuild and BlibFilter should continue preserving whatever you give it, regardless of whether you keep the redundant .blib files.

But, again, one of my software maxims is "If it isn't explicitly tested, it doesn't work." And I am not that confident in our testing of various strategies for combinging .blib files with and without .redundant.blib files. So, watch closely, and report any issues you encounter.

Thanks again for your ambitious use of Skyline in your research.

--Brendan
 
gabe responded:  2017-09-18 05:10
Hi Brendan, I think I may have found a bug in Skyline (or BiblioSpec). You said "Skyline shouldn't be losing any retention times". But check out this behavior: I made two .blib files from search results from two different proteomes, A549 and THP1. Then I combined the .blib files using the 'Build' feature in Skyline. If I query a single peptide from (1) A549, (2) THP1, and (3) combined blib files, here are the results: A549
id          peptideModSeq         RedundantRefSpectraID  SpectrumSourceID  retentionTime  bestSpectrum  fileName
----------  --------------------  ---------------------  ----------------  -------------  ------------  ---------------------------------------------------------------------------------------------
11089	LLDLVQQSC[+324.2]NYK	65964	3	64.3145	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_03.mzXML
11089	LLDLVQQSC[+324.2]NYK	66037	3	64.6555	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_03.mzXML
11089	LLDLVQQSC[+324.2]NYK	40662	2	65.535	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	66135	3	65.001167	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_03.mzXML
11089	LLDLVQQSC[+324.2]NYK	40670	2	65.562833	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	40735	2	65.8775	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	66243	3	65.3515	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_03.mzXML
11089	LLDLVQQSC[+324.2]NYK	40843	2	66.223333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	66351	3	65.699667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_03.mzXML
11089	LLDLVQQSC[+324.2]NYK	40954	2	66.5835	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	40960	2	66.601333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	41072	2	66.9795	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_02.mzXML
11089	LLDLVQQSC[+324.2]NYK	15037	1	67.013	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_01.mzXML
11089	LLDLVQQSC[+324.2]NYK	15111	1	67.355833	1	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_01.mzXML
11089	LLDLVQQSC[+324.2]NYK	15210	1	67.702333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_01.mzXML
11089	LLDLVQQSC[+324.2]NYK	15315	1	68.051167	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_01.mzXML
11089	LLDLVQQSC[+324.2]NYK	15427	1	68.419667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_01.mzXML

THP1
id          peptideModSeq         RedundantRefSpectraID  SpectrumSourceID  retentionTime  bestSpectrum  fileName
----------  --------------------  ---------------------  ----------------  -------------  ------------  ---------------------------------------------------------------------------------------------
7595	LLDLVQQSC[+324.2]NYK	11813	1	58.350667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	11852	1	58.532	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	11886	1	58.719	1	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	11927	1	58.9045	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	11955	1	59.025333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	11957	1	59.029667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	11966	1	59.095833	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
7595	LLDLVQQSC[+324.2]NYK	35509	2	63.534333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	35527	2	63.612167	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	35543	2	63.666833	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	35557	2	63.715833	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	35595	2	63.9015	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	35629	2	64.088667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	35667	2	64.279667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_02.mzXML
7595	LLDLVQQSC[+324.2]NYK	57725	3	65.108833	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_03.mzXML
7595	LLDLVQQSC[+324.2]NYK	57760	3	65.294333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_03.mzXML
7595	LLDLVQQSC[+324.2]NYK	57799	3	65.480667	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_03.mzXML
7595	LLDLVQQSC[+324.2]NYK	57805	3	65.5065	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_03.mzXML
7595	LLDLVQQSC[+324.2]NYK	57806	3	65.509333	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_03.mzXML

Combined
id          peptideModSeq         RedundantRefSpectraID  SpectrumSourceID  retentionTime  bestSpectrum  fileName
----------  --------------------  ---------------------  ----------------  -------------  ------------  ---------------------------------------------------------------------------------------------
14777	LLDLVQQSC[+324.2]NYK	27144	4	58.719	1	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\THP1\170830_THP1_legacy2016\20161205_THP1_DDA_longrun_01.mzXML
14777	LLDLVQQSC[+324.2]NYK	11089	1	67.355833	0	P:/Platform_general\Skyline\ReferenceSpectralLibraries_and_DDA_runs\A549\170830_A549_legacy2016\A549_01.mzXML

So, clearly Skyline is losing all the redundant retention time info when two .blib files are combined. Can you confirm that this is NOT the desired/expected behavior? I can provide details if needed. For now I think I can do what I need by combining ALL the individual .pep.XML results when I need to make a master reference spectral library. But this would be much more convenient if individual .blib files could be combined, too... Thanks Gabe
 
Kaipo Tamura responded:  2017-09-18 11:36
Hi Gabe,

You are right, this looks like a bug. Do you mind uploading the blib files to: https://skyline.ms/files.url

Thanks,
Kaipo
 
gabe responded:  2017-09-18 15:43
Yup, thanks Kaipo. I uploaded a zip file called combined.zip containing:

A549.blib
THP1.blib
combined.blib

which should be self-explanatory per the example given above.

Happy to provide any other info. Please let me know if you can/can't recreate this behavior.

Thanks

Gabe
 
Kaipo Tamura responded:  2017-09-25 11:18
Hi Gabe,
I am able to reproduce this bug - BiblioSpec does not properly combine non-redundant .blib files. Sorry for the inconvenience, we will work on creating a fix for it.

Thanks,
Kaipo