Inconsistent length between m/z and intensity array after extraction from RefSpectraPeaks table from .blib file

support
Inconsistent length between m/z and intensity array after extraction from RefSpectraPeaks table from .blib file chiyy  2026-02-05 13:48
 

Dear Skyline team,

I extracted the RefSpectraPeaks table from a .blib file in Python using the code below and then decoded the peakMZ and peakIntensity columns:

import pandas as pd
import numpy as np
import sqlite3
import zlib

def decode_peaks(binary_data, dtype=np.float64):
    """
    Decode zlib-compressed peak data from Skyline .blib files
    
    Parameters:
    binary_data: bytes object from peakMZ or peakIntensity column
    dtype: numpy dtype (default: np.float64)
    
    Returns:
    numpy array of values
    """
    if binary_data is None or len(binary_data) == 0:
        return np.array([])
    
    # Decompress the zlib-compressed data
    decompressed = zlib.decompress(binary_data)
    
    # Try float64 first, then float32 if that fails
    try:
        values = np.frombuffer(decompressed, dtype=np.float64)
    except ValueError:
        # If float64 doesn't work, try float32
        values = np.frombuffer(decompressed, dtype=np.float32)
    
    return values

with sqlite3.connect('input/LIT_GPF_survey_newAlign_MMCC_boundaries_opttrans_nochick.blib') as conn:
    ref_spectra_df = pd.read_sql_query('SELECT * FROM RefSpectra', conn)
    ref_spectra_peaks_df = pd.read_sql_query('SELECT * FROM RefSpectraPeaks', conn)


# Decode the columns
ref_spectra_peaks_df['peakMZ'] = ref_spectra_peaks_df['peakMZ'].apply(decode_peaks)
ref_spectra_peaks_df['peakIntensity'] = ref_spectra_peaks_df['peakIntensity'].apply(decode_peaks)

# record the length of every m/z and intensity array
len_recorder = []
for _, row in ref_spectra_peaks_df.iterrows():
    len_recorder.append([len(row['peakMZ']), len(row['peakIntensity'])] )

In len_recorder, the lengths look like the following:
[[775, 775],
[1320, 660],
[724, 362],
[814, 407],
[1083, 1083],
[1170, 585],
[954, 477],
[1022, 511],
[1310, 655],
[1285, 1285],
[900, 450],
[820, 410],
[1273, 1273],
[1381, 1381],
[1134, 567],
[595, 595],
[708, 354],
[1352, 676],
[980, 490],
[830, 415],
[1327, 1327],
[1181, 1181],
[1131, 1131],
[1319, 1319],
[1087, 1087],
[821, 821],
[929, 929], ...]

Some of the m/z and intensity arrays have the same length, while in other cases the m/z arrays have exactly twice the length of the corresponding intensity arrays. The .blib file is too big to be attached. It is under this project: https://panoramaweb.org/stellar-biofluid-prm.url. /SkylineFiles/LIT_GPF_survey_newAlign_MMCC_boundaries_opttrans_nochick_2024-06-02_15-26-11/LIT GPF survey newAlign MMCC boundaries opttrans nochick.blib.

Thank you very much for your help!

 
 
Brian Pratt responded:  2026-02-05 14:01

Perhaps you could split your decode_peaks function into one that handles only float64 and one that handles float32?

 
Nick Shulman responded:  2026-02-05 14:14
The intensities will always be 4 byte floats and the mzs will always be 8 byte doubles.

The code that you have which falls back to trying to convert the bytes to floats only if it fails to convert to doubles will not work correctly because the only reason an array of bytes would not be able to be converted to doubles is that its length is not divisible by 8.

Here is the code that Skyline uses for reading those values:
https://github.com/ProteoWizard/pwiz/blob/765f898bbdb00d8cfd35520310ac4af57cc803b9/pwiz_tools/Skyline/Model/Lib/BiblioSpecLite.cs#L1160

One tricky thing about reading the mzs and intensities from a .blib file is that you must pay attention to the "numPeaks" value in the RefSpectra table. If the length of the peakMz blob is eight times the numPeaks or the peakIntensity blob is four times numPeaks, then it means that there is no compression, and you should just use the byte array as it is.

By the way, it sounds like you intended to attach a file to this support request, but no file actually got attached.
This often happens if the file you were trying to attach is greater in size than the 50MB limit.
You can always upload larger files here:
https://skyline.ms/files.url
-- Nick
 
chiyy responded:  2026-02-05 14:17

Thank you Brian. This is exactly the cause! mz is encoded by float64 while intensity is encoded by float32. Now it works perfectly with different decode float type:

def decode_peaks_float64(binary_data):
    return np.frombuffer(zlib.decompress(binary_data), dtype=np.float64)

def decode_peaks_float32(binary_data):
    return np.frombuffer(zlib.decompress(binary_data), dtype=np.float32)

# Decode the columns
ref_spectra_peaks_df['peakMZ'] = ref_spectra_peaks_df['peakMZ'].apply(decode_peaks_float64)
ref_spectra_peaks_df['peakIntensity'] = ref_spectra_peaks_df['peakIntensity'].apply(decode_peaks_float32)
 
chiyy responded:  2026-02-05 14:30

Thanks Nick. This is super helpful. For my case here, it seems the binarys are compressed since they are not either multiple of 4, 8 or numPeaks.

The fire sharing looks amazing. I will use this next time.