Issue 800: BiblioSpec MaxQuant reader should look in parent folders for raw data files

issues
Status:open
Assigned To:matt.chambers42@gmail.com
Type:Defect
Area:Skyline
Priority:3
Milestone:21.2
Opened:2021-05-07 by Brendan MacLean
Changed:2021-05-07 by matt.chambers42
Resolved:
Resolution:
Closed:
2021-05-07 Brendan MacLean
Title»BiblioSpec MaxQuant reader should look in parent folders for raw data files
Assigned To»matt.chambers42@gmail.com
Type»Defect
Area»Skyline
Priority»3
Milestone»21.2
In working with a published dataset (Bruderer MCP 2015) I found that the most intuitive way to organize the data left the RAW data files several parenting levels above the msms.txt file, also the mqpar.xml seemed to naturally sit 2 levels up from the msms.txt file (maybe that is already supported). I ended up having to move everything down into the folder with the msms.txt file.

So, I went from:
<root>\*.raw
<root>\MaxQuant Searches\Standard-Profiling-Sample-Set\mqpar.xml
<root>\MaxQuant Searches\Standard-Profiling-Sample-Set\combined\txt\msms.txt

I would have liked this just to work. It may be that having 2 levels (MaxQuant Searches\Standard-Profiling-Sample-Set) for the MaxQuant search is not completely standard, but I get the sense that the root of what could be considered the MaxQuant search is Standard-Profiling-Sample-Set and that you would not normally expect all of the raw data files to live inside this folder hierarchy, and the mqpar.xml file may be fairly normally placed.

Please fix the reader to look up the parent hierarchy. You can find the dataset it question at:

brendanx@nexus:data\MacCoss\MProphet\20150501_Bruderer\dda

2021-05-07 matt.chambers42
It currently looks up to 2 levels up (so it would find raw files as siblings of the combined directory). That is the most common way to run MaxQuant (basically it creates the combined directory wherever the RAW files are, although I forgot what happens if the RAWs are in different places). I think your way would have to have either moved the RAWs afterward or specifically told MaxQuant to use a different location for the combined output.

It's trivial to change how far up it looks as long as it's a constant depth. Looking up to 3 levels should work for your case. You want it to look up to 4 levels? Or do you want it to look all the way to the drive root?

2021-05-07 matt.chambers42
Sorry, you'd need 4 levels, not 3.

2021-05-07 Brendan MacLean
I think 4 levels should be plenty. I think maybe we should improve the message also to let the user know it looks in parent folders. 2 levels up is already a pretty crowded folder:

07/20/2019 04:53 PM <DIR> .
07/20/2019 04:53 PM <DIR> ..
03/11/2017 06:51 PM <DIR> B_D140314_SGSDSsample1_R01_MSG_T0
03/11/2017 06:51 PM <DIR> B_D140314_SGSDSsample1_R02_MSG_T0
03/11/2017 06:51 PM <DIR> B_D140314_SGSDSsample1_R03_MSG_T0
03/11/2017 06:51 PM <DIR> B_D140314_SGSDSsample2_R01_MSG_T0
03/11/2017 06:51 PM <DIR> B_D140314_SGSDSsample2_R02_MSG_T0
03/11/2017 06:51 PM <DIR> B_D140314_SGSDSsample2_R03_MSG_T0
03/11/2017 06:52 PM <DIR> B_D140314_SGSDSsample3_R01_MSG_T0
03/11/2017 06:52 PM <DIR> B_D140314_SGSDSsample3_R02_MSG_T0
03/11/2017 06:52 PM <DIR> B_D140314_SGSDSsample3_R03_MSG_T0
03/11/2017 06:52 PM <DIR> B_D140314_SGSDSsample4_R01_MSG_T0
03/11/2017 06:52 PM <DIR> B_D140314_SGSDSsample4_R02_MSG_T0
03/11/2017 06:52 PM <DIR> B_D140314_SGSDSsample4_R03_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample5_R01_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample5_R02_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample5_R03_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample6_R01_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample6_R02_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample6_R03_MSG_T0
03/11/2017 06:53 PM <DIR> B_D140314_SGSDSsample7_R01_MSG_T0
03/11/2017 06:54 PM <DIR> B_D140314_SGSDSsample7_R02_MSG_T0
03/11/2017 06:54 PM <DIR> B_D140314_SGSDSsample7_R03_MSG_T0
03/11/2017 06:54 PM <DIR> B_D140314_SGSDSsample8_R01_MSG_T0
03/11/2017 06:54 PM <DIR> B_D140314_SGSDSsample8_R02_MSG_T0
03/11/2017 06:54 PM <DIR> B_D140314_SGSDSsample8_R03_MSG_T0
03/11/2017 06:55 PM <DIR> combined
02/13/2015 11:50 AM 7,660,806 B_D140314_SGSDSsample1_R01_MSG_T0.index
02/13/2015 11:50 AM 7,673,646 B_D140314_SGSDSsample1_R02_MSG_T0.index
02/13/2015 11:50 AM 7,765,846 B_D140314_SGSDSsample1_R03_MSG_T0.index
02/13/2015 11:50 AM 7,639,746 B_D140314_SGSDSsample2_R01_MSG_T0.index
02/13/2015 11:50 AM 7,711,446 B_D140314_SGSDSsample2_R02_MSG_T0.index
02/13/2015 11:50 AM 7,751,266 B_D140314_SGSDSsample2_R03_MSG_T0.index
02/13/2015 11:51 AM 7,629,046 B_D140314_SGSDSsample3_R01_MSG_T0.index
02/13/2015 11:51 AM 7,712,626 B_D140314_SGSDSsample3_R02_MSG_T0.index
02/13/2015 11:51 AM 7,671,446 B_D140314_SGSDSsample3_R03_MSG_T0.index
02/13/2015 11:51 AM 7,701,666 B_D140314_SGSDSsample4_R01_MSG_T0.index
02/13/2015 11:51 AM 7,734,566 B_D140314_SGSDSsample4_R02_MSG_T0.index
02/13/2015 11:51 AM 7,713,946 B_D140314_SGSDSsample4_R03_MSG_T0.index
02/13/2015 11:52 AM 7,706,006 B_D140314_SGSDSsample5_R01_MSG_T0.index
02/13/2015 11:52 AM 7,745,626 B_D140314_SGSDSsample5_R02_MSG_T0.index
02/13/2015 11:52 AM 7,769,406 B_D140314_SGSDSsample5_R03_MSG_T0.index
02/13/2015 11:52 AM 7,706,146 B_D140314_SGSDSsample6_R01_MSG_T0.index
02/13/2015 11:52 AM 7,768,626 B_D140314_SGSDSsample6_R02_MSG_T0.index
02/13/2015 11:52 AM 7,723,726 B_D140314_SGSDSsample6_R03_MSG_T0.index
02/13/2015 11:52 AM 7,693,806 B_D140314_SGSDSsample7_R01_MSG_T0.index
02/13/2015 11:52 AM 7,765,646 B_D140314_SGSDSsample7_R02_MSG_T0.index
02/13/2015 11:52 AM 7,744,966 B_D140314_SGSDSsample7_R03_MSG_T0.index
02/13/2015 11:52 AM 7,762,226 B_D140314_SGSDSsample8_R01_MSG_T0.index
02/13/2015 11:52 AM 7,748,706 B_D140314_SGSDSsample8_R02_MSG_T0.index
02/13/2015 11:52 AM 7,716,286 B_D140314_SGSDSsample8_R03_MSG_T0.index
02/13/2015 11:50 AM 12,531 mqpar.xml

It is strange that this structure also appears to have _T0 as a suffix of all of the names. I wonder why that suffix appears to be stripped off in the Raw File field of the msms.txt. I had to rename all of the raw data files to get the library to build:

11/11/2014 01:00 AM 4,704,988 A_D141110_0uM-APAP_MnsMRM_R01.raw
11/11/2014 01:00 AM 4,705,270 A_D141110_0uM-APAP_MnsMRM_R02.raw
11/11/2014 01:00 AM 4,705,270 A_D141110_0uM-APAP_MnsMRM_R03.raw
11/11/2014 01:00 AM 4,705,270 A_D141110_3333uM-APAP_MnsMRM_R01.raw
11/11/2014 01:00 AM 4,704,988 A_D141110_3333uM-APAP_MnsMRM_R02.raw
11/11/2014 01:00 AM 4,704,988 A_D141110_3333uM-APAP_MnsMRM_R03.raw
10/24/2014 01:00 AM 871,557,166 B_D140314_SGSDSsample1_R01_MSG.raw
10/25/2014 01:00 AM 915,130,482 B_D140314_SGSDSsample1_R02_MSG.raw
10/25/2014 01:00 AM 892,220,522 B_D140314_SGSDSsample1_R03_MSG.raw
10/24/2014 01:00 AM 883,435,019 B_D140314_SGSDSsample2_R01_MSG.raw
10/25/2014 01:00 AM 899,057,884 B_D140314_SGSDSsample2_R02_MSG.raw
10/25/2014 01:00 AM 887,758,054 B_D140314_SGSDSsample2_R03_MSG.raw
10/25/2014 01:00 AM 895,456,567 B_D140314_SGSDSsample3_R01_MSG.raw
10/25/2014 01:00 AM 923,474,482 B_D140314_SGSDSsample3_R02_MSG.raw
10/25/2014 01:00 AM 900,043,464 B_D140314_SGSDSsample3_R03_MSG.raw
10/25/2014 01:00 AM 909,388,496 B_D140314_SGSDSsample4_R01_MSG.raw
10/25/2014 01:00 AM 926,623,154 B_D140314_SGSDSsample4_R02_MSG.raw
10/24/2014 01:00 AM 882,471,620 B_D140314_SGSDSsample4_R03_MSG.raw
10/24/2014 01:00 AM 878,463,077 B_D140314_SGSDSsample5_R01_MSG.raw
10/24/2014 01:00 AM 885,207,390 B_D140314_SGSDSsample5_R02_MSG.raw
10/25/2014 01:00 AM 892,749,324 B_D140314_SGSDSsample5_R03_MSG.raw
10/24/2014 01:00 AM 878,931,703 B_D140314_SGSDSsample6_R01_MSG.raw
10/25/2014 01:00 AM 898,930,076 B_D140314_SGSDSsample6_R02_MSG.raw
10/24/2014 01:00 AM 881,770,654 B_D140314_SGSDSsample6_R03_MSG.raw
10/25/2014 01:00 AM 893,247,106 B_D140314_SGSDSsample7_R01_MSG.raw
10/25/2014 01:00 AM 910,046,130 B_D140314_SGSDSsample7_R02_MSG.raw
10/24/2014 01:00 AM 857,501,892 B_D140314_SGSDSsample7_R03_MSG.raw
10/25/2014 01:00 AM 903,981,613 B_D140314_SGSDSsample8_R01_MSG.raw
10/25/2014 01:00 AM 905,201,293 B_D140314_SGSDSsample8_R02_MSG.raw
10/24/2014 01:00 AM 856,177,459 B_D140314_SGSDSsample8_R03_MSG.raw

2021-05-07 matt.chambers42
Huh. That is a mystery. Google didn't turn up anything related to maxquant and "_T0" so I guess it's not a systemic thing. Maybe that person really renamed their rawfiles before uploading to a repository...