.blib requirement to recognize the presence of ion mobility data jpaezpae  2023-06-29 13:47
 

Hello!

In the context of a a diaPASEF experiment. I am attempting to manually add IMS information to a .blib file (python+sqlite) and I am getting the following error when skyline attempts to use the the .blib to generate an .imsdb

---------------------------
Skyline-daily
---------------------------
The library F:\FILE_LOCATION.blib does not contain ion mobility information.
---------------------------
OK
---------------------------

I have added the following columns:

RefSpectra."ionMobilityType" INTEGER (all 2s in my case)
RefSpectra."collisionalCrossSectionSqA" REAL (filled with 0s)
RefSpectra."ionMobilityHighEnergyOffset" INTEGER (filled with 0s)
RefSpectra."ionMobility" REAL (filled using the real values for the ions)

RetentionTimes."ionMobility" REAL (actual mobility values)
RetentionTimes."collisionalCrossSectionSqA" REAL (filled with 0s)
RetentionTimes."ionMobilityHighEnergyOffset" INTEGER (filled with 0s)
RetentionTimes."ionMobilityType" INTEGER (2s)

IonMobilityTypes ->
id (INT)    ionMobilityType (VARCHAR 128)
  3	compensation(V)
  1	driftTime(msec)
  2	inverseK0(Vsec/cm^2)
  0	none

Is there any other way to provide IMS information to the library (something like a .SSL but for IMS) or to manually generate a .imsdb?
It is worth noting that the .blib works fine, it can be used as a spectral library to generate a document and import data, but does not provide the ion mobility information to skyline (no ims window shown in the MS2 and does not enable the selection for IMS resolution when creating the document for a DIA search).

Please let me know where I can find documentation on the format that could help me troubleshoot this issue or any insights you might have on what I might be missing to make it a valid file!

(I am using encyclopedia to generate the .blib from a .dlib)

Kindest wishes,
Sebastian

 
 
Brian Pratt responded:  2023-06-29 14:14

Hi Sebastian,

Can I see your .blib file? That is probably the fastest way to understand what's going wrong.

Meanwhile, have a look at https://raw.githubusercontent.com/ProteoWizard/pwiz/master/pwiz_tools/BiblioSpec/tests/reference/tables.check for the blib schema - ionMobilityHighEnergyOffset for example is not an integer value.

Also note that .imsdb is also an SQLite file, so producing that directly would be an option if you're already comfortable with that kind of thing.

Best regards
Brian Pratt

 
Brian Pratt responded:  2023-06-29 14:19

It would also be useful to see any other documents in use for your process, for example whatever source you have for the IMS values.

 
jpaezpae responded:  2023-06-29 17:27

Hello Brian!

Thanks for the reply!
I checked the schema and added it, unfortunately it does not seem to solve the issue.
I am attaching sampleblib.blib which is the one I am generating. [file was too large, link here https://github.com/jspaezp/burner_repo/releases/download/v0.0.1/sampleblib.blib]

I am also attaching sky.blib which was generated following the diaPASEF tutorial with the 'small data' and then modified using the script below. Oddly enough, skyline does identify the IMS data in this file correctly after the values have been over-written by my script.

This is the script I am using to add the mobility info (pardon the messy code). Let me know if you need the 'weights files' as well!

import sqlite3
import pandas as pd
import argparse
import lightgbm as lgbm

pd.set_option("display.max_columns", None)


REF_SPECTRA_SCHEMA = """
CREATE TABLE RefSpectra ( -- spectrum metadata - actual mz/intensity pairs in RefSpectraPeaks
    id INTEGER primary key autoincrement not null, -- lookup key for RefSpectraPeaks
    peptideSeq VARCHAR(150), -- unmodified peptide sequence, can be left blank for small molecule use
    precursorMZ REAL, -- mz of the precursor that produced this spectrum
    precursorCharge INTEGER, -- should agree with adduct if provided
    peptideModSeq VARCHAR(200), -- modified peptide sequence, can be left blank for small molecule use
    prevAA CHAR(1), -- position of peptide in its parent protein (can be left blank)
    nextAA CHAR(1),  -- position of peptide in its parent protein (can be left blank)
    copies INTEGER, -- number of copies this spectrum was chosen from if it is in a filtered library
    numPeaks INTEGER, -- number of peaks, should agree with corresponding entry in RefSpectraPeaks
    ionMobility REAL, -- ion mobility value, if known (see ionMobilityType for units)
    collisionalCrossSectionSqA REAL, -- precursor CCS in square Angstroms for ion mobility, if known
    ionMobilityHighEnergyOffset REAL, -- ion mobility value increment for fragments (see ionMobilityType for units)
    ionMobilityType TINYINT, -- ion mobility units (required if ionMobility is used, see IonMobilityTypes table for key)
    retentionTime REAL, -- chromatographic retention time in minutes, if known
    startTime REAL, -- start retention time in minutes, if known
    endTime REAL, -- end retention time in minutes, if known
    totalIonCurrent REAL, -- total ion current of spectrum
    moleculeName VARCHAR(128), -- precursor molecule's name (not needed for peptides)
    chemicalFormula VARCHAR(128), -- precursor molecule's neutral formula (not needed for peptides)
    precursorAdduct VARCHAR(128), -- ionizing adduct e.g. [M+Na], [2M-H2O+2H] etc (not needed for peptides)
    inchiKey VARCHAR(128), -- molecular identifier for structure retrieval (not needed for peptides)
    otherKeys VARCHAR(128), -- alternative molecular identifiers for structure retrieval, tab separated name:value pairs e.g. cas:58-08-2\thmdb:01847 (not needed for peptides)
    fileID INTEGER, -- index into SpectrumSourceFiles table for source file information
    SpecIDinFile VARCHAR(256), -- original spectrum label, id, or description in source file
    score REAL, -- spectrum score, typically a probability score (see scoreType)
    scoreType TINYINT -- spectrum score type, see ScoreTypes table for meaning
);
CREATE INDEX idxPeptide ON RefSpectra (peptideSeq, precursorCharge);
CREATE INDEX idxPeptideMod ON RefSpectra (peptideModSeq, precursorCharge);
CREATE INDEX idxMoleculeName ON RefSpectra (moleculeName, precursorAdduct);
CREATE INDEX idxInChiKey ON RefSpectra (inchiKey, precursorAdduct);
"""

RT_SCHEMA = """
CREATE TABLE IF NOT EXISTS "RetentionTimes" (
"RefSpectraID" INTEGER,
  "RedundantRefSpectraID" INTEGER,
  "SpectrumSourceID" INTEGER,
  "ionMobility" REAL,
  "collisionalCrossSectionSqA" REAL,
  "ionMobilityHighEnergyOffset" REAL,
  "ionMobilityType" INTEGER,
  "retentionTime" REAL,
  "startTime" TEXT,
  "endTime" TEXT,
  "score" REAL,
  "bestSpectrum" INTEGER
);
"""


def main(blib, one_over_k0_model_file, ccs_model_file):
    conn = sqlite3.connect(blib)
    cur = conn.cursor()
    print("Reading from file")
    df = pd.read_sql_query("SELECT * FROM RefSpectra", conn)
    print(df.head())

    pred_df = df[["id", "peptideSeq", "precursorMZ", "precursorCharge"]].copy()

    pred_df["PepLength"] = pred_df["peptideSeq"].str.len()
    pred_df["NumBulky"] = pred_df["peptideSeq"].str.count("[LVIFWY]")
    pred_df["NumPos"] = pred_df["peptideSeq"].str.count("[KRH]")
    pred_df["NumNeg"] = pred_df["peptideSeq"].str.count("[DE]")

    print("Predicting ion mobility")
    ######### Prediction Start
    one_over_k0_model = lgbm.Booster(model_file=one_over_k0_model_file)
    ccs_model = lgbm.Booster(model_file=ccs_model_file)

    ook0_predicted = one_over_k0_model.predict(
        pred_df[
            [
                "precursorMZ",
                "precursorCharge",
                "PepLength",
                "NumBulky",
                "NumPos",
                "NumNeg",
            ]
        ]
    )
    ccs_predicted = ccs_model.predict(
        pred_df[
            [
                "precursorMZ",
                "precursorCharge",
                "PepLength",
                "NumBulky",
                "NumPos",
                "NumNeg",
            ]
        ]
    )

    id_to_imns = dict(zip(pred_df["id"], ook0_predicted))
    id_to_ccs = dict(zip(pred_df["id"], ccs_predicted))
    ####### Prediction End

    try:
        del df["ionMobilityValue"]
    except KeyError:
        pass
    df["collisionalCrossSectionSqA"] = ccs_predicted
    df["ionMobility"] = ook0_predicted
    df["ionMobilityHighEnergyOffset"] = 0
    df["ionMobilityHighEnergyOffset"] = df["ionMobilityHighEnergyOffset"].astype("float64")
    df["ionMobilityType"] = 2

    print("Writing to file")
    print("Updating RefSpectra Table")
    
    cur.execute("DROP TABLE RefSpectra")
    cur.executescript(REF_SPECTRA_SCHEMA)
    df.to_sql("RefSpectra", conn, if_exists="append", index=False, schema=REF_SPECTRA_SCHEMA)

    print("Updating RetentionTimes Table")
    rtdf = pd.read_sql_query("SELECT * FROM RetentionTimes", conn)

    try:
        del rtdf["ionMobilityValue"]
    except KeyError:
        pass

    rtdf["ionMobility"] = [id_to_imns[x] for x in rtdf["RefSpectraID"]]
    rtdf["collisionalCrossSectionSqA"] = [id_to_ccs[x] for x in rtdf["RefSpectraID"]]
    rtdf["ionMobilityType"] = 2
    rtdf["ionMobilityHighEnergyOffset"] = 0
    rtdf["ionMobilityHighEnergyOffset"] = rtdf["ionMobilityHighEnergyOffset"].astype("float64")

    cur.execute("DROP TABLE RetentionTimes")
    cur.executescript(RT_SCHEMA)
    rtdf.to_sql("RetentionTimes", conn, if_exists="replace", index=False, schema=RT_SCHEMA)

    # Just a print for sanity checking
    print("Printing header of the new table")
    df = pd.read_sql_query("SELECT * FROM RefSpectra LIMIT 5", conn)
    print(df.head())

    df = pd.read_sql_query("SELECT * FROM RetentionTimes LIMIT 5", conn)
    print(df.head())

    print("Creating ims info table")
    # Encyclopedia does not write this table, so we add it here
    try:
        cur.executescript(
            """
        CREATE TABLE IonMobilityTypes (
            id INTEGER PRIMARY KEY,
            ionMobilityType VARCHAR(128)
        );
        INSERT INTO IonMobilityTypes(id, ionMobilityType) VALUES(0, 'none');
        INSERT INTO IonMobilityTypes(id, ionMobilityType) VALUES(1, 'driftTime(msec)');
        INSERT INTO IonMobilityTypes(id, ionMobilityType) VALUES(2, 'inverseK0(Vsec/cm^2)');
        INSERT INTO IonMobilityTypes(id, ionMobilityType) VALUES(3, 'compensation(V)');
        """
        )
    except sqlite3.OperationalError as e:
        if "table IonMobilityTypes already exists" in str(e):
            pass
        else:
            raise


    conn.commit()
    conn.close()
    print("Done!")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Add IMS to BLIB")
    parser.add_argument("blib", help="BLIB file")
    parser.add_argument(
        "one_over_k0_model_file",
        help="Weights file with the IMS model to use for the lgbm model that predicts 1/k0",
    )
    parser.add_argument(
        "ccs_model_file",
        help="Weights file with the IMS model to use for the lgbm model that predicts CCS",
    )

    args, unknown = parser.parse_known_args()
    if unknown:
        raise RuntimeError("Unrecognized arguments: ", unknown)
    else:
        main(
            args.blib,
            one_over_k0_model_file=args.one_over_k0_model_file,
            ccs_model_file=args.ccs_model_file,
        )

On the other hand, is there any place where the .imsdb schema is documented? Opening one manually I had questions on several of the fields.

Thank you very much for the help and time!
Sebastian

 
Brian Pratt responded:  2023-06-30 10:01

Hi Sebastian,

Good news, it's a simple fix: you just need to properly declare the LibInfo schema version number in the "minorVersion" field. I set it to "6" and everything works fine with your file.

minorVersion INTEGER -- Schema version number:
-- Version 10 adds TIC as a column
-- Version 9 adds Proteins and RefSpectraProteins tables
-- Version 8 adds startTime and endTime
-- Version 7 adds peak annotations
-- Version 6 generalized ion mobility to value, high energy offset, and type (currently drift time msec, and inverse reduced ion mobility Vsec/cm2)
-- Version 5 added small molecule columns
-- Version 4 added collisional cross section for ion mobility, still supports drift time only
-- Version 3 added product ion mobility offset information for Waters Mse IMS
-- Version 2 added ion mobility information

 
jpaezpae responded:  2023-06-30 13:57

That is great!

I tried it out locally and it works like a charm!
Do you feel like it would be worth having inference of the schema version as a feature?

Thank you so much for the help!
Kindest wishes
-Sebastian

 
Brian Pratt responded:  2023-06-30 14:28

Do you feel like it would be worth having inference of the schema version as a feature?

I feel it would just be more opportunity for error. That declaration is there so we don't have to do a lot of guesswork.