Unable to import Spectra from Bruker DDA analysis.tdf or analysis.tdf_bin using BlibBuild

Unable to import Spectra from Bruker DDA analysis.tdf or analysis.tdf_bin using BlibBuild lubwen  2024-01-08 10:31

I ran into this error of 'No spectra were found for the new library' when importing DDA data from Bruker's raw analysis.tdf (or analysis.tdf_bin) data.

I am using a tab-delimited .ssl result file (954.ssl, see attached), with the following example lines:

file scan charge sequence
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d/analysis.tdf_bin 177793 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d/analysis.tdf_bin 176745 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d/analysis.tdf_bin 176808 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
I am attaching the error screenshot to this thread. The original tdf and tdf_bin files are too large (~760MB) for this forum.

Could someone help me track down the issue?

Thanks a lot!
Bingwen Lu
Brian Pratt responded:  2024-01-08 10:38

This should work fine if you refer to just HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d instead of HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d/analysis.tdf_bin in your ssl file.

Thanks for using the Skyline support board!

Brian Pratt

lubwen responded:  2024-01-08 10:51

Before I posted, I did try using the .d method. Here is the error:

ERROR: Could not open spectrum file 'HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d' for search results file '954_small.ssl'.

P:\Platform_general\Spectronaut\shared\bruker_DDAs>more 954_small.ssl
file scan charge sequence
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 177793 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 176745 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 176808 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 176251 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 176125 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 177041 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 176373 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 176068 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d 177633 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR

P:\Platform_general\Spectronaut\shared\bruker_DDAs>d:\Bngwen\BlibBuild\BlibBuild.exe 954_small.ssl out.blib
Reading results from 954_small.ssl.
ERROR: Could not open spectrum file 'HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d' for search results file '954_small.ssl'.

Matt Chambers responded:  2024-01-08 12:31
Hi Bingwen, Try using the nativeID in the scan column. You can see what that is in the mzML conversion of the Bruker file (with combineIonMobility on). This is not really yet a supported workflow because straight DDA searches from timsTOF spectra are unusual. Usually the data gets further processed into a less native format like MGF, where precursors have been merged across frames (which the pwiz reader does not do unless you use the scanSumming filter, which BiblioSpec does not). Then the SSL file would refer to that MGF, not the raw .d file. If using the nativeID doesn't make things clear, I may be able to help more if you explain more about where the PSMs are coming from and what that scan column refers to.
lubwen responded:  2024-01-08 14:39

Thanks a lot Matt for your reply!

We did a straight DDA search on timsTOF raw .d data using Spectronaut v.18. The scan Number was from the search export. I am attaching a screenshot showing the PSM for scan # 176068.
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d/analysis.tdf_bin 176068 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR

It appears that Spectronaut knows the spectrum for this specific scan number.

Also it appears to me that BlibBuild can read spectra information from the analysis.tdf :-) -- they are just not matched to the scan numbers provided by Spectronaut

Matt Chambers responded:  2024-01-09 07:32

Unfortunately a single monotonic "scan number" doesn't make any sense when talking about timsTOF data. TimsTOF data is addressed by frames and scans. Each frame is a different retention time and might have 1000 (mobility) scans. For DDA data, many of those scans are for the same precursor, so they are merged together. And then scans for the same precursor from adjacent frames are also merged together. But exactly how that merging is done is up to the reader, e.g. Spectronaut. So there's no real "scan number" that points to the raw data.

This is why MSFragger produces an MGF/mzML file when it processes timsTOF data. Does Spectronaut produce a processed spectrum file like MGF or mzML for timsTOF data? If not, that should be added, and you'll have to find another way to export a library from Spectronaut like the OpenSWATH TSV format. Or maybe Spectronaut can export a library in SpectraST or NIST MSP format?

Brian Pratt responded:  2024-01-09 09:29

Matt is correct, this is a meaningless number without knowing how Spectronaut produced it from the actual frame and scan information.

One guess would be that you can get from that "scan number" to the actual data by dividing by frame size and using the remainder. Looking at a few examples of Spectronaut views and confirming the supposed mapping with SeeMS would be helpful.

lubwen responded:  2024-01-10 15:26

Hi Matt,

I managed to use Spectronaut to export a TSV file, please see attached. Do you think this will work? Do I need to convert this TSV to another TSV (e.g. to an OPEN SWATH-style TSV?) to make it compatible with BlibBuild? I was not able to run BlibBuild on this TSV file directly - it complained that header 'filename' is not found.

Brian Pratt responded:  2024-01-11 07:14

I don't see anything that looks like a scan identifier. Perhaps there are other Spectronaut reports you can try?

Brian Pratt responded:  2024-01-11 08:00

I should ask though - what's the end goal here? I think that tsv file is a good candidate for Skyline's Import Assay Library, it's just missing a column for ion mobility units (presumably "1/K0", but Skyline can't infer that up front).

lubwen responded:  2024-01-11 09:03

The end goal is to import the transitions into a .blib file for Skyline's Import Assay Library. IonMobility is column #10

Brian Pratt responded:  2024-01-11 09:14

IonMobility is column #10

Yes, the issue is that there's no corresponding column for ion mobility units.

Import Assay Library works by reading such tsv files, you don't have to do a separate step building a blib file. Try adding a ion mobility units column where all the values are "1/K0" and see how it goes. You might need to clean up the peptide declarations a bit too, what with those "_" characters, but I think this is the direction you need to go.

lubwen responded:  2024-01-11 09:23

Hi Matt,

This is the reply from Spectronaut people: 'the MS2ScanNumber reported should correspond to the frame-index within the DDA-PASEF file'.

How should I revise the .ssl file to make BlibBuild work? Thanks for any advice.

lubwen responded:  2024-01-11 10:05

I see. Thanks. Perhaps I mis-understood your message.

What we are trying to achieve is to import the transitions into a BLIB file - we have an internal system built upon BLIB files where users can browse and export selected peptides/transitions to be further imported into Skyline to build PRM assays.

Hence, the end goal is to import the data into a BLIB file. (though this might sound like an intermediate goal here)

It would be nice if I can supply correct scan # or frame # in the .ssl file where BlibBuild can then go into .d folder and extract the correct MS2 spectrum for each PSM. Do you think this is possible, @Matt?

Brian Pratt responded:  2024-01-11 11:17

'the MS2ScanNumber reported should correspond to the frame-index within the DDA-PASEF file'.

What do they mean by "frame-index"? I think it's probably the PasefFrameMsMsInfo table, but it's hard to be sure.

lubwen responded:  2024-01-12 16:43

Perhaps it is this 'frame-index', @Matt? See attached, second column of this spectrum table. I am viewing the DDA .d data using AlphaTIMS.

Matt Chambers responded:  2024-01-12 21:43

What MS2ScanNumber are they referring to? Frame index is almost certainly referring to the frame coordinate, which corresponds with all the mobility scans taken at a given retention time in the timsTOF file. But there are usually around a thousand mobility scans per frame. For ddaPASEF, there may be 1-4 precursors (IIRC) per frame. A DDA search is only going to use the scans from a single precursor, and that may be hundreds of scans merged together. And it may also merge the same precursor from different frames together, although it sounds like Spectronaut is not doing that if they are able to report a single "frame-index'. But they still must be summing across all the scans for the same precursor.

I suppose we could take the frame index and the precursor m/z and figure out which scans to use, but that's not anything like what BlibBuild is currently set up to do to look up a spectrum. So if you wanted to make this work for SSL, it's going to take some scripting on your part. I suggest you make an msconvert mzML or MGF with combine ion mobility on. Don't do any scanSumming filter because it sounds like Spectronaut doesn't sum across frames. Then you should be able to take the frame index and precursor m/z from the Spectronaut output and look up which spectrum id is for that frame and precursor m/z. MGF output would probably be easier to deal with (you would find the MGF spectrum index with the corresponding frame and PEPMASS).

lubwen responded:  2024-01-13 12:45

Thanks Matt. I will try your recommendation.

lubwen responded:  2024-01-13 21:47
Hi Matt, I seems to be unable to find frame-index of '176068' after converting to MGF with scan summing turned off. Am I missing something? Would you be able to have a look for me

BTW, scripting is not an issue for me, I just need to know what to be scripted (I even have a modified version of BuildBuild custom-tailored for our internal system where I modified the C++ code).
Matt Chambers responded:  2024-01-15 08:37
It's in the TITLE line: frame=xxx. From command-line msconvert, you can also customize the titleMaker filter so that it also contains the precursor m/z ("<SelectedIonMz>" in the filter's format string). There's only 45k frames though. The last one is frame=45028. If MS2ScanNumber goes up to 176068, then it is not the frame index.
Brian Pratt responded:  2024-01-15 15:12
Have you considered the PasefFrameMsMsInfo table in the tdf file? Possibly "frame-index" is an index into this table, precursor mz is "IsolationMz", and the MS2 scan is the summation of the scans in frame "Frame" from ScanNumBegin to ScanNumEnd.
Matt Chambers responded:  2024-01-16 07:20
That's a reasonable guess, and querying the SQLite would make it fairly simple to come up with the correct id to search. The only tricky part would be figuring out the merged=xxx index. You have to include MS1s among the MS2s to calculate that properly. Bingwen, let me know if that MS2ScanNumber refers to the PasefFrameMsMsInfo table and I'll help you come up with the logic to figure out the full id (merged=xxx frame=xxxx scanStart=xxx scanEnd=xxxx).
lubwen responded:  2024-01-16 09:39

Thanks guys. I think we are getting closer. :-)

Spectronaut people responded with a revised output, with a new report column called "PSM.MS2VendorAPIScanNumber". According to them, this new field provides vendor specific context on how to retrieve the corresponding scan. For DDA PASEF it will report the FrameIDs and PrecursorID that were used to make the identification (quote "from the PasefFrameMsMSInfo table").

Specifically, for the MS2ScanNumber 176068, the "PSM.MS2VendorAPIScanNumber" is: FrameID: 37228;37230 | PrecursorID: 148053

I am currently tracking down what this spectrum is, but perhaps you guys know how to get to it faster. Attached you can also find the revised TSV file containing this "PSM.MS2VendorAPIScanNumber" column.

Matt Chambers responded:  2024-01-16 10:04

OK that's better! And it confirms they are merging frames, so MS2ScanNumber definitely isn't a straight index into PasefFrameMsMSInfo because that table always has one row per MS2 frame/precursor pair. Because they're summing the spectra from same precursor across multiple frames, there's not going to be a perfect way to look at exactly the same peaks that Spectronaut considered: there's no way to tell msconvert to sum spectra in exactly the same way as indicated by the MS2VendorAPIScanNumber column. Msconvert's own scanSumming filter can also merge across frames, but it would probably be impossible to get it to merge in exactly the same way Spectronaut does. But you can probably get pretty close by using just one of the frames (probably the median frame in the range they give) without scanSumming. And you'll still have to look up the PrecursorID in the PasefFrameMsMSInfo table so you can get the scanStart and scanEnd (and to be fair pwiz could probably include PrecursorId in the spectrum id for merged ddaPASEF scans).

In short, you'll have to write a script and query the SQLite to convert from:
FrameID: 37228;37230 | PrecursorID: 148053
merged=xxxx frame=37229 scanStart=xxxx scanEnd=xxxx
where your script figures out the xxx's by querying the PasefFrameMsMSInfo table. Note that the scanStart in pwiz will be +1 and the scanEnd should match what's in PasefFrameMsMSInfo. Rather than trying to calculate it yourself, it'll probably be easier for your script to just grep the mzML or MGF file for the correct "frame=37229 scanStart=xxxx scanEnd=xxxx" string and then set the merged=xxxx according to that.

Brian Pratt responded:  2024-01-16 10:36

Ideally that logic would be baked in to BlibBuild, I'd think? It sounds like Bingwen is comfortable working in that C++ code already, I'm sure we'd be happy to consider merging those changes into the official code.

Matt Chambers responded:  2024-01-16 10:56

Depends whether you're talking about directly reading Spectronaut's TSV output or not. If so, I could see having the logic to parse the new MS2VendorAPIScanNumber column in BlibBuild's PwizParser, but that would require adding PrecursorId to the ddaPASEF scan ids in core pwiz as well (but as I mentioned before I think that'd be a fair improvement). But reading that TSV would require a whole new class of TSV reader I think, so that's more work than just new spectrum lookup logic. Not difficult work though, just a time investment.

But for just reading SSL input, I'm not sure trying to parse the spectrum id as specific Bruker coordinates makes a lot of sense (rather than 0 or 1 based index, or an exact string match in the matched spectrum file).

lubwen responded:  2024-01-16 16:34

Thanks a TON Matt!

I managed to script the NativeID's into an SSL file, using information from MGF TITLE line and Bruker's PasefFrameMsMsInfo table, and successfully run BlibBuild on the SSL to generate a .blib file I wanted :-)

This is the example SSL file with the spectrum we discussed on:

file scan charge sequence
HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d/analysis.tdf merged=301403 frame=37228 scanStart=400 scanEnd=424 3 [+42.0]AAAAAAVGPGAGGAGSAVPGGAGPC[+324.2]ATVSVFPGAR

lubwen responded:  2024-01-16 16:37

P.S.: Using average frame # (37229) did not work so I chose to use just one of the frames when there are multiple frames.

lubwen responded:  2024-01-16 16:44

One more note to add: for the NativeID, only the 'merged=XXX' information is missing from the PasefFrameMsMsInfo table (scanStart and scanEnd are already there). MGF file is used solely to retrieve the 'merged=XXX' information. If this 'merged=XXX' info can be derived/computed somehow, then MGF file can be let go.

lubwen responded:  2024-01-17 12:08

Hi guys,

One more question to ask, for BlibBuild using SSL files, I do not see a column header for 'ionMobility'. Do we plan to allow ion mobility values to be optionally included when importing timsTOF data? (This will be really nice)

I can see that in the 'RefSpectra' SQLite schema, there is an 'ionMobility' column. Hence this sounds like we are almost ready?

Brian Pratt responded:  2024-01-17 12:28

Easy enough. Note that we'd also want a column for "ionMobilityUnits". The value would be "1/K0" in your case. A CCS column while we're at it, too.

lubwen responded:  2024-01-17 12:44

Thanks - this will be really nice. I would love to try it when you have a revised BlibBuild ready

lubwen responded:  2024-01-19 10:20

Matt said:
"Depends whether you're talking about directly reading Spectronaut's TSV output or not. If so, I could see having the logic to parse the new MS2VendorAPIScanNumber column in BlibBuild's PwizParser, but that would require adding PrecursorId to the ddaPASEF scan ids in core pwiz as well (but as I mentioned before I think that'd be a fair improvement). But reading that TSV would require a whole new class of TSV reader I think, so that's more work than just new spectrum lookup logic. Not difficult work though, just a time investment."

It looks like the extracted MS2 spectra do not look as nice -- not as many peaks are annotated as B/Y ions (in fact most spectrum annotations look horrible).

Perhaps this is due to the fact that I chose only one of the frames when converting MS2VendorAPIScanNumber into 'NativeID' for SSL?

Hence, it would be nice if we can directly read Spectronaut TSV output and perform frame-merging using BlibBuild. I know you are pretty busy Matt but I hope you can have some time for this?

Matt Chambers responded:  2024-01-24 13:30

See if you can get the Spectronaut folks to tell you what criteria they use for merging frames. I.e. is there a tolerance of RT for the same precursor m/z that get merged together? It might be possible to use the scanSumming filter to create something pretty close to what they used.

lubwen responded:  2024-01-24 14:48

Thanks Matt. Let me ask them

lubwen responded:  2024-02-02 11:35

Hi Matt,

I did not get a response from Spectronaut people. However, I retrieved the RT tolerance data according to information provided in the 'MS2VendorAPIScanNumber' column. Data and plot of RT tolerance distribution are attached. It looks like the difference between two consecutive frames is 0.10656000000017 seconds and frames from the same precursor are merged only if they are within 5 frames apart. (hence it looks like the RT tolerance is ~0.6 seconds)

Do you think we can make MSConvert to work using such information?

lubwen responded:  2024-02-05 15:26

It appears that using RT tolerance of 0.6 second when doing scan summing allows me to generate spectra that are reasonably annotated. Thanks a ton @Matt. This thread can now be considered as closed. :-)

lubwen responded:  2024-02-06 09:49

Brian said:
Easy enough. Note that we'd also want a column for "ionMobilityUnits". The value would be "1/K0" in your case. A CCS column while we're at it, too.

@Brian: have you guys revised the SSL parser to import these extra columns? ('ionMobility', 'ionMobilityUnit', 'CCS')

Brian Pratt responded:  2024-02-06 10:00

Yes, the latest builds support these optional columns

for example (kind of a crazy one, with mixed IM units),

file scan charge sequence modifications ion-mobility ion-mobility-units ccs
demo.ms2 136 3 NFLETVELQVGLK NFLETVELQVGLK 20.0 ms 1
demo.ms2 150 3 NFLETVELQVGLK NFLETVELQVGLK 80.0 none 0

lubwen responded:  2024-02-06 10:39

That is awesome thanks a ton Brian!! 😊

Matt Chambers responded:  2024-02-06 10:44

Glad to hear you got it working! I was actually composing a response last week about using scanSumming with the RT tolerance you derived from the data. But I got hung up on how BlibBuild would be able to match spectra in the SSL to the scanSummed spectra, because the summed spectra don't keep the source scan ids even in the scanList element or the output spectra's id attributes. How'd you end up linking them up? Am I correct that you didn't end up making any source code changes to BiblioSpec?

lubwen responded:  2024-02-06 10:58

You are correct I did not modify the source code for BlibBuid.

Instead I set the RT tolerance when running MSConvert to convert .d to MGF (see next msg for the actual CLI). Then I query the "PasefFrameMsMsInfo" table using PrecursorID and FrameID to figure out the corresponding scan # in the MGF files (which are results of scan summing). This in turn give me MSMS spectra (from the MGF file) that can be well-annotated when viewed with Lorikeet. (The scan #'s are included in the SSL that are being parsed by BlibBuild)

lubwen responded:  2024-02-06 11:00

"C:\Program Files\ProteoWizard\ProteoWizard 3.0.23286.85202be\msconvert.exe" --mgf --32 --combineIonMobilitySpectra --filter "peakPicking vendor msLevel=1-" --filter "scanSumming precursorTol=0.05 scanTimeTol=0.6 ionMobilityTol=5 sumMs1=0" --filter "msLevel 2-" --filter "threshold count 500 most-intense" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState> File:"""^<SourcePath^>""", NativeID:"""^<Id^>"""" "P:\Platform_general\Spectronaut\shared\bruker_DDAs\HT_20221220_Vividion_sampleB12_1000ng_Slot1-19_1_954.d"