Strange error when making Library from mzML files from TimsTOF

mnt

2022-06-07 05:46

Hi Skyline

I am attempting to use skyline on a dataset of mine, however in the process of creating a library, the following error is generated.

---------------------------
Skyline
---------------------------
ERROR: No spectra were found for the new library.

Command-line: C:\Users\au297068\AppData\Local\Apps\2.0\V13GT215.OHN\V48609XA.R6H\skyl..tion_e4141a2a22107248_0015.0002_38ac599f018d3163\BlibBuild -s -A -H -o -c 0.95 -i IMTest -K -S "C:\Users\au297068\AppData\Local\Temp\tmpC0B2.tmp" "C:\Users\au297068\Desktop\IMTest-xiSEARCH\SkylineOUT\Compressed\IMTest.redundant.blib"
Working directory: C:\Users\au297068\Desktop\IMTest-xiSEARCH\Data\Compressed
---------------------------
OK More Info
---------------------------
System.IO.IOException: ERROR: No spectra were found for the new library.

Command-line: C:\Users\au297068\AppData\Local\Apps\2.0\V13GT215.OHN\V48609XA.R6H\skyl..tion_e4141a2a22107248_0015.0002_38ac599f018d3163\BlibBuild -s -A -H -o -c 0.95 -i IMTest -K -S "C:\Users\au297068\AppData\Local\Temp\tmpC0B2.tmp" "C:\Users\au297068\Desktop\IMTest-xiSEARCH\SkylineOUT\Compressed\IMTest.redundant.blib"
Working directory: C:\Users\au297068\Desktop\IMTest-xiSEARCH\Data\Compressed
   ved pwiz.Common.SystemUtil.ProcessRunner.Run(ProcessStartInfo psi, String stdin, IProgressMonitor progress, IProgressStatus& status, TextWriter writer, ProcessPriorityClass priorityClass) i C:\proj\skyline_21_2_x64\pwiz_tools\Shared\Common\SystemUtil\ProcessRunner.cs:linje 149
   ved pwiz.BiblioSpec.BlibBuild.BuildLibrary(LibraryBuildAction libraryBuildAction, IProgressMonitor progressMonitor, IProgressStatus& status, String& commandArgs, String& messageLog, String[]& ambiguous) i C:\proj\skyline_21_2_x64\pwiz_tools\Shared\BiblioSpec\BlibBuild.cs:linje 201
   ved pwiz.Skyline.Model.Lib.BiblioSpecLiteBuilder.BuildLibrary(IProgressMonitor progress) i C:\proj\skyline_21_2_x64\pwiz_tools\Skyline\Model\Lib\BiblioSpecLiteBuilder.cs:linje 157
---------------------------

I am unsure of what exactly this means, however the purpose of the test is to investigate if we can use mzML files that are converted like this. The orriginal data is from a TimsTOF, the converted mzML file is with ScanSumming in order to collapse the IM. Thus my question is if this is an error, or if this approach is not compatible with Skyline.

Best
Martin

Nick Shulman responded:	2022-06-07 07:13
The error message is saying that the spectral library that was created has no spectra in it. It sounds like you are using the "File > Import > Peptide Search" wizard. Are you creating a spectral library from peptide search results that you already have, or have you asked Skyline to do either a DDA or DIA peptide search on your data? If you have asked Skyline to do a peptide search on your raw data, you might get the "no spectra were found" error if your settings are wrong and because of this, no peptides could be found. For instance, if you specify a mass accuracy which is much too high that might cause no peptides to be found. If you are creating a spectral library from peptide search results that you already have, sometimes that error happens if BiblioSpec is misinterpreting the scores in the search results. Sometimes, BiblioSpec gets confused trying to interpret the spectrum identifiers in the peptide search results and cannot find the spectra that it is supposed to find in the .mzML file. If you send us all of your files, we can figure out what is going wrong. You should send us your Skyline document and your .mzML files. You should also send us some screenshots of the pages of the Import Peptide Search wizard so that we can see what settings you are using. In Skyline you can use the menu item: File > Share to create a .zip file containing your Skyline document. You should send us that .zip, your .mzML files. Also, if Skyline is doing the peptide search then you should send us your FASTA file. If you are importing your own peptide search results, you should send us your peptide search result files. This page tells which files are necessary for each peptide search engine: https://skyline.ms/wiki/home/software/BiblioSpec/page.view?name=BlibBuild If your files are less than 50MB you can attach them to this support request. You can upload larger files here: https://skyline.ms/files.url -- Nick

mnt responded:	2022-06-08 03:14
Hi Nick Thanks a lot for the quick reply. When i make the zip file you request, it is litteraly empty.. So not much point in sending nothing, right. Also, i'm unsure of how to proceed wit this for two reasones, 1) this is unpublished data, so could we proceed somewhere not on a public forum (?), and 2) the files for this is way to big to upload (20Gb per file). What i'm trying to do is to use IM data instead of non-IM data as it contains a lot more information and thus increases my odds of actually finding anything. :) Below i'll describe what i'm trying to do. So, I'm converting IM data as shown here (see the sectiion for TimsTOF data) : https://fragpipe.nesvilab.org/docs/tutorial_convert.html The data has been searched with xiSEARCH : https://www.rappsilberlab.org/software/xisearch/ The generated *.ssl file is being read into Skyline and mapped against the mzML files as described here : Dallas, D., & Nielsen, S. D. (2018). Milk Peptidomics to Identify Functional Peptides and for Quality Control of Dairy Products. In Methods in molecular biology (pp. 223–240). https://doi.org/10.1007/978-1-4939-7537-2_15 This entire workflow has worked for me in the past, this time the only difference is that the IM on the TimsTOF was engaged, thus, this is really a test of if it is possible to do this workflow using IM data. :) Also, does Skyline support that zlib compression is engaged (?) if yes the files are only 10Gb per file. So, could we continue off forum and what do we do about the masive file sizes? Best Martin

Nick Shulman responded:	2022-06-08 05:45
I will send you an email so you can send me your files privately. If you are trying to build a spectral library from a ".ssl" file and you are getting the "No spectra were found for the new library" error, then the problem is probably that the format of what you have in the "scan" column is incorrect. I am not sure exactly what is supposed to go in that "scan" column. Sometimes it's a single integer, sometimes I think it's supposed to maybe be 4 integers separated by dots. I will be able to figure it out after you send me your data files. If you cannot send your data files, I could ask around and I think someone might be able to tell you what those scan numbers are supposed to look like for the specific type of data that you have. It is true that .mzML files are very large, but they do compress well. That is, after you have created a .mzML file, you can use a program such as "7-zip" to create a .zip file and that .zip file will be much smaller. When you choose "Use zlib compression" in msconvert, you do end up with a smaller .mzML file. However, if you are planning on putting the .mzML file into a .zip file, the size of the final .zip file will probably be the same regardless of whether "Use zlib compression" was turned on. Yes, Skyline can handle mzML file which have zlib compression. In newer versions of msconvert, there is a checkbox "combine ion mobility scans". That checkbox will result in much smaller mzML files for ion mobility data. You can also send us your Bruker .d folder instead of the .mzML file. -- Nick

mnt responded:	2022-06-09 00:06
Hi Nick Thanks a lot, I’ll send you a mail shortly with a link you can use to downloade the files. In here you can find the ssl file as well as two .7z archives, in each you can find both a .mgf and a *.mzML file of the same file. If you need more information let me know. Also, just for clarification, this data has nothing to do with reality from a biological point of view, but for testing/establishing this pipeline, it is suitable. Just to clarify, the files are converted using settings on this link : https://raw.githubusercontent.com/Nesvilab/FragPipe/gh-pages/images/10.jpg Note that the “combine ion mobility scans” is enabeled. The difference for mgf and mzML is simply the file format and that mgf files do not support zlib compression. I know that this seems a bit strange, but let me try to elaborate on how this works, or how I want it to work. The goal is to use IM data for the search as IM data contains a lot more information than non-IM data. xiSEARCH (linked before) is used to search the mgf files, xiSEARCH only supports mgf files, therefore the mgf files. The idea is to use Skyline for further validation and to perform some kind of match between run matching. To do so, an ssl file is generated by xiSEARCH, the ssl file will point to the mzML files, which is where these are needed. The concept behind the use of Skyline is similar to that described in the book chapter I linked before (see section 3.7 in that chapter). The difference is that I’m using an ssl file and not an output from !XTandem. This has the added benefit that skyline will include matches between runs which would improve my dataset. When it comes to the scan column, I have another huge file from xiSEARCH where I have the “native scan ID”, or atleast I think I have. Therefore I have these 4 integers that you mention, but I am unsure of how they should be inserted in the ssl file, what are the format needed? Should it only be something like : “merged=366137 frame=47896 scanStart=247 scanEnd=265” or is something more needed? At this point I don’t really know how to visually inspect mzML files to extract this information. So, I guess the actual question is what should be written in the “scan” column? Also, if you need anything else, let me know. Best Martin

Nick Shulman responded:	2022-06-10 10:50
Thank you for sending your .mzML files and your .ssl file. I will have to ask my coworkers to find out how this is supposed to work. It appears that maybe because these are Bruker TDF files, ProteoWizard does not know how to find a spectrum from its scan number. That is, if you look at the function "translateScanNumberToNativeId": https://github.com/ProteoWizard/pwiz/blob/master/pwiz/data/msdata/MSData.cpp#L537 there is no logic for "MS_Bruker_TDF_nativeID_format". I will try to find someone who can answer your question. -- Nick

Matt Chambers responded:	2022-06-10 12:16
It's not possible to get the original frame and scan number from a single scan number (e.g. the 'merged' field, which corresponds with the spectrum's index attribute) like the 'translateScanNumberToNativeID' function does with several other formats. It also can't handle Sciex WIFF, for example. If the mzML and MGF are created from the same file with the same settings (or the MGF created from the mzML) AND the MGF has the nativeID stored in it somehow (like via the titleMaker filter), then it should be possible to map from the MGF to the mzML with only the 'merged' field from the source spectrum's nativeID. The merged field in the MGF's title string will correspond with the spectrum's index in the mzML. So, to write an SSL that refers to a spectrum's index rather than the 1 based scan number, you can prefix the number with "index=", e.g. "index=123" might map to "merged=123 frame=1 scanStart=100 scanEnd=200". You could also just put the whole nativeID "merged=123 frame=1 scanStart=100 scanEnd=200" in the SSL and that should work too. It should be a little more robust actually since it would be very unlikely to map to the wrong spectrum in a mismatched mzML file (e.g. created with different settings than the MGF).

mnt responded:	2022-06-12 23:25
Hi Nick & Matt Thanks a lot for your help. :) Based on your reply(s), I made a little test, and i think i found a solution. Atleast Skyline is not complaining and it seems to be loading something. The solution is to simply put "Index=" infront of the scan numbers in the Scan column in the orriginal .ssl file. In addition i tried the other solution with "merged=123 frame=1 scanStart=100 scanEnd=200" in the scan column, but this failed as well. In both cases i have uploaded the generated .ssl files to the before mentioned shared folder, the two new files are called "IMTest-Skyline-Index.ssl" and "IMTest-Skyline-Test.ssl" respectivly. Would it be possible for you to validate that the one that works ("IMTest-Skyline-Index.ssl") is correct? Best Martin

Matt Chambers responded:	2022-06-13 09:30
I'm not sure why the nativeID version isn't working - I'll look into that - but as long as the index you're using is 0-based (scan numbers are usually 1-based), then that should be correct. I couldn't tell from your files which one your number is.

mnt responded:	2022-06-13 23:14
Hi Matt Good question, I have no clue, nor do I know how to investigate. I did make another ssl file now with the "-1" prefix, the results of this one are very close if not identical to those in the orriginal. Is there way to investigate this? Regarding the output, I'm following the book chapter i cited, they reffer to settings from "Dallas Lab" found here : http://www.dallaslab.org/s/Peptidomics_standard_output.skyr In that output format, how should i interpret the column with "[File name] Identified" where the options are Alligned/True/False? Would this reffer to some kind of match between runs, if yes, what exactly does these options mean? A totally different question, i sometimes observe (unfortunately not with the files involved here) that skyline fails to search the files (step 10 in section 3.7 in the chapter i cited) The only way to get Skyline to work through the files is to change the retention time restrictions to " include all matching scans" found in Full-Scan under Transition Settings. Could the solution above (specefying the "Index=xxx") resolve this as well maybe? I'm unsure if you follow, i was just wondering if the two could be connected. Best Martin

Matt Chambers responded:	2022-06-14 07:46
The retention time restrictions should only affect Skyline importing results (from the mzML files), not Skyline building the peptide library (from the SSL files). It's possible I think that for some files your MS1 peaks could be further than 5 minutes away from your MS2 ids. I'm not sure what you mean by the "-1" prefix. By 0-based and 1-based I meant the scans are numbered starting from 0 or 1. So: index=0 index=1 index=2 corresponds line-by-line with: scan=1 scan=2 scan=3 So if your SSL has scan numbers (without the "scan=" prefix) and you just add the "index=" prefix in front without decrementing the number by 1, you are going to be mapping to the wrong scans.

mnt responded:	2022-06-14 09:07
Hi Matt Sorry for not being clear last, I was in a hurry. The -1 file I made was an attempt to go from a 1-based count to a 0-based count, simply by substracting 1 from all the scan values. As the *.ssl file is generated by xiSEARCH, (not me) I have asked the developer to clarify the value found in that colum is 1-based or 0-based. I tested putting “scan=” infront of the value in the “scan” column as you suggested and this doesn’t work at all, here this error is produced. --------------------------- Skyline --------------------------- ERROR: No spectra were found for the new library. Command-line: C:\Users\marti\AppData\Local\Apps\2.0\82ODZ185.JON\AEKA4571.RW6\skyl..tion_e4141a2a22107248_0015.0002_38ac599f018d3163\BlibBuild -s -A -H -o -c 0.95 -i IMtest_Scan -K -S "C:\Users\marti\AppData\Local\Temp\tmpC9DC.tmp" "E:\IMTest-xiSEARCH\Skyline\Skyline-Scan\IMtest_Scan.redundant.blib" Working directory: E:\IMTest-xiSEARCH\Compressed --------------------------- OK More Info --------------------------- System.IO.IOException: ERROR: No spectra were found for the new library. Command-line: C:\Users\marti\AppData\Local\Apps\2.0\82ODZ185.JON\AEKA4571.RW6\skyl..tion_e4141a2a22107248_0015.0002_38ac599f018d3163\BlibBuild -s -A -H -o -c 0.95 -i IMtest_Scan -K -S "C:\Users\marti\AppData\Local\Temp\tmpC9DC.tmp" "E:\IMTest-xiSEARCH\Skyline\Skyline-Scan\IMtest_Scan.redundant.blib" Working directory: E:\IMTest-xiSEARCH\Compressed ved pwiz.Common.SystemUtil.ProcessRunner.Run(ProcessStartInfo psi, String stdin, IProgressMonitor progress, IProgressStatus& status, TextWriter writer, ProcessPriorityClass priorityClass) i C:\proj\skyline_21_2_x64\pwiz_tools\Shared\Common\SystemUtil\ProcessRunner.cs:linje 149 ved pwiz.BiblioSpec.BlibBuild.BuildLibrary(LibraryBuildAction libraryBuildAction, IProgressMonitor progressMonitor, IProgressStatus& status, String& commandArgs, String& messageLog, String[]& ambiguous) i C:\proj\skyline_21_2_x64\pwiz_tools\Shared\BiblioSpec\BlibBuild.cs:linje 201 ved pwiz.Skyline.Model.Lib.BiblioSpecLiteBuilder.BuildLibrary(IProgressMonitor progress) i C:\proj\skyline_21_2_x64\pwiz_tools\Skyline\Model\Lib\BiblioSpecLiteBuilder.cs:linje 157 --------------------------- So, therefore, I would guess that it is the “index=” that is correct. To this I’m awaiting the answer to if the value is 0-based or 1-based. Or what are your thoughts? Now, back to my two other questions. 1) Regarding the output, I'm following the book chapter i cited, they reffer to settings from "Dallas Lab" found here : http://www.dallaslab.org/s/Peptidomics_standard_output.skyr In that output format, how should i interpret the column with "[File name] Identified" where the options are Alligned/True/False? Would this reffer to some kind of match between runs, if yes, what exactly does these options mean? 2) A totally different question, i sometimes observe (unfortunately not with the files involved here) that skyline fails to search the files (step 10 in section 3.7 in the chapter i cited above). The only way to get Skyline to work through the files is to change the retention time restrictions to " include all matching scans" found in Full-Scan under Transition Settings. Could the solution above (specefying the "Index=xxx") resolve this as well maybe? I'm unsure if you follow, i was just wondering if the two could be connected. Would you like these questions in separate question? For question 2) I intent to test it on a dataset where this usually happens, when I have time. I’m just wondering what your thoughts are. 😊