Raman spectral data format for MCR-ALS [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 5 days ago.

Improve this question

I have a chemometrics project where I need to put Raman Spectroscopy data files into a Multivariate Curve Resolution model. I have been given .CNF files which came off the instrument, converted into .TXT in a format shown below. This is unlike any standard file format I can find through searching, so read commands I have tried in spectrochempy have not worked and given a "none type has no attribute plot" error.

# Sample type: QA Calibration A
# User name:   
# Sample description: QA Count                                                        
#
# Start time:    2018-01-05, 07:53:59
# Real time (s): 690.050
# Live time (s): 600.000
#
# Total counts:  49930
#
# Left marker:  4095 (1024.358 keV)
# Right marker: 4097 (1024.858 keV)
# Counts:  6
#
# Energy calibration coefficients ( E = sum(Ai * n**i) )
#     A0: -0.196120
#     A1: 0.250152
#     A2: 0.000000
#     A3: 0.000000
# Energy unit: keV
#
# Channel data
# n energy(keV) counts  rate(1/s)
#-----------------------------------------------------------------------
1   0.054   0   0
2   0.304   0   0
3   0.554   0   0
4   0.804   0   0
5   1.055   0   0
.....
8188    2048.772    0   0
8189    2049.023    0   0
8190    2049.273    0   0
8191    2049.523    1   0.00166667
8192    2049.774    0   0

I have tried adding my data's instrument details into the example file given in spectrochempy and reading the file with the example data, which plots successfully. When I add my data list of energies, then it doesn't plot (nonetype error).

My aim is to get this data into a format that can be read by a package (currently using spectrochempy in jupyterlab) that I can then use for a MCR-ALS analysis - so I need the data in a matrix with count(?) and wavelength, I believe. All of the example data files I am unable to make sense of to make my data into a format similar. I haven't been able to find any example data inputs that are visual rather than just described, and I'm new to this so step-by-step guidance would be appreciated!

I also have the PDF report file from the instrument. I will need to do this process for 200000 samples, so an automated/simple way would be important.

You are probably better off asking in a group that deals with Raman spectral analysis than here. It isn't a programming problem as such you are trying to read files produced by a specific piece of kit but in a format that the analysis package doesn't understand. You certainly need to specify the maker and model number of the instrument (and searching github might yield useful results). — Martin Brown
– Martin Brown, Commented Nov 25 at 11:13
tour, How to Ask, minimal reproducible example. learn what an AttributeError is and decipher the entire error message. you should have shown the entire traceback, and the minimal code to repro the issue. — Christoph Rackwitz
– Christoph Rackwitz, Commented Nov 25 at 14:28

Tino D · Accepted Answer · 2025-11-25 12:57:42Z

I wrote a short but detailed answer about how to extract this information. Read the comments carefully:

import pandas as pd
file = open("testData.txt", "r") # open text
startOfData = False # init an indicator to signal that the numeric data started
df = pd.DataFrame() # start a dataframe
for line in file.readlines(): # read all lines in the file
    if line[:2] == "1 " or startOfData: # if the line start with 1 and followed by a space, or if we already know that the data started
        lineSplitted = line.strip().split(" ") # remove newline and split with single space
        row = list() # init list
        for number in lineSplitted: # for every value in the splitted line
            if number != "": # if not empty
                row.append(number) # append to row
        if startOfData: # if we already detected the data
            df.loc[pd.to_numeric(row[0])-1] = row[1:] # then just append
        else: # if we did not detect the data yet, we neet to prototype the dataframe
            '''
            ofc, here you can also try to get the headers directly.
            '''
            df.loc[0,"energy (keV)"] = row[1] # first value
            df.loc[0,"counts"] = row[2] # second value
            df.loc[0,"rate (1/s)"] = row[3] # third value
        startOfData = True # signal that the start of the data started, this will stay true
df = df.astype("float") # convert to float.

There is still the logic of reading many files and concatinating, I guess you'll be able to manage this. Just put the below code in a function and call it in your loop.

Below are the results:

        energy (keV)    counts  rate (1/s)
0   0.054           0.0     0.000000
1   0.304           0.0     0.000000
2   0.554           0.0     0.000000
3   0.804           0.0     0.000000
4   1.055           0.0     0.000000
8187    2048.772    0.0     0.000000
8188    2049.023    0.0     0.000000
8189    2049.273    0.0     0.000000
8190    2049.523    1.0     0.001667
8191    2049.774    0.0     0.000000

Collectives™ on Stack Overflow

Raman spectral data format for MCR-ALS [closed]

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related