1

I have a chemometrics project where I need to put Raman Spectroscopy data files into a Multivariate Curve Resolution model. I have been given .CNF files which came off the instrument, converted into .TXT in a format shown below. This is unlike any standard file format I can find through searching, so read commands I have tried in spectrochempy have not worked and given a "none type has no attribute plot" error.

# Sample type: QA Calibration A
# User name:   
# Sample description: QA Count                                                        
#
# Start time:    2018-01-05, 07:53:59
# Real time (s): 690.050
# Live time (s): 600.000
#
# Total counts:  49930
#
# Left marker:  4095 (1024.358 keV)
# Right marker: 4097 (1024.858 keV)
# Counts:  6
#
# Energy calibration coefficients ( E = sum(Ai * n**i) )
#     A0: -0.196120
#     A1: 0.250152
#     A2: 0.000000
#     A3: 0.000000
# Energy unit: keV
#
# Channel data
# n energy(keV) counts  rate(1/s)
#-----------------------------------------------------------------------
1   0.054   0   0
2   0.304   0   0
3   0.554   0   0
4   0.804   0   0
5   1.055   0   0
.....
8188    2048.772    0   0
8189    2049.023    0   0
8190    2049.273    0   0
8191    2049.523    1   0.00166667
8192    2049.774    0   0

I have tried adding my data's instrument details into the example file given in spectrochempy and reading the file with the example data, which plots successfully. When I add my data list of energies, then it doesn't plot (nonetype error).

My aim is to get this data into a format that can be read by a package (currently using spectrochempy in jupyterlab) that I can then use for a MCR-ALS analysis - so I need the data in a matrix with count(?) and wavelength, I believe. All of the example data files I am unable to make sense of to make my data into a format similar. I haven't been able to find any example data inputs that are visual rather than just described, and I'm new to this so step-by-step guidance would be appreciated!

I also have the PDF report file from the instrument. I will need to do this process for 200000 samples, so an automated/simple way would be important.

New contributor
Jml is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
3
  • You are probably better off asking in a group that deals with Raman spectral analysis than here. It isn't a programming problem as such you are trying to read files produced by a specific piece of kit but in a format that the analysis package doesn't understand. You certainly need to specify the maker and model number of the instrument (and searching github might yield useful results). Commented Nov 25 at 11:13
  • why don't you just use pd.read_csv() with skiprows = ... ? Commented Nov 25 at 12:36
  • tour, How to Ask, minimal reproducible example. learn what an AttributeError is and decipher the entire error message. you should have shown the entire traceback, and the minimal code to repro the issue. Commented Nov 25 at 14:28

1 Answer 1

0

I wrote a short but detailed answer about how to extract this information. Read the comments carefully:

import pandas as pd
file = open("testData.txt", "r") # open text
startOfData = False # init an indicator to signal that the numeric data started
df = pd.DataFrame() # start a dataframe
for line in file.readlines(): # read all lines in the file
    if line[:2] == "1 " or startOfData: # if the line start with 1 and followed by a space, or if we already know that the data started
        lineSplitted = line.strip().split(" ") # remove newline and split with single space
        row = list() # init list
        for number in lineSplitted: # for every value in the splitted line
            if number != "": # if not empty
                row.append(number) # append to row
        if startOfData: # if we already detected the data
            df.loc[pd.to_numeric(row[0])-1] = row[1:] # then just append
        else: # if we did not detect the data yet, we neet to prototype the dataframe
            '''
            ofc, here you can also try to get the headers directly.
            '''
            df.loc[0,"energy (keV)"] = row[1] # first value
            df.loc[0,"counts"] = row[2] # second value
            df.loc[0,"rate (1/s)"] = row[3] # third value
        startOfData = True # signal that the start of the data started, this will stay true
df = df.astype("float") # convert to float.

There is still the logic of reading many files and concatinating, I guess you'll be able to manage this. Just put the below code in a function and call it in your loop.

Below are the results:

        energy (keV)    counts  rate (1/s)
0   0.054           0.0     0.000000
1   0.304           0.0     0.000000
2   0.554           0.0     0.000000
3   0.804           0.0     0.000000
4   1.055           0.0     0.000000
8187    2048.772    0.0     0.000000
8188    2049.023    0.0     0.000000
8189    2049.273    0.0     0.000000
8190    2049.523    1.0     0.001667
8191    2049.774    0.0     0.000000
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.