-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading shimadzu .lcd files #29
Comments
Hi Rüdiger, Thanks for your feedback. I would definitely be interested in hearing more about your innovations and potentially incorporating them into the package. It would be nice to add support for the fluorescence data! Re: Issue 1, This was just a workaround because I couldn't figure out where the length of the stream was encoded. It worked with all the data files I had access to, but I'm not surprised that it didn't generalize well to every file. I would definitely be interested in fixing this if you can tell me how the length of the stream is encoded. It sounds like maybe this discrepancy is due to a difference in the way the fluorometer stream is encoded compared to the PDA stream. Re: Issue 2. Regarding your comment about the M1 processor, I'm not sure what issue you're running into, but I can tell you that the package is definitely functional on M1 macs, because I am actually doing most of the development of the package on an M1 mac. I'm guessing there is some other issue with your miniconda installation that is causing the installation to fail. To be honest, the python dependencies have been quite a headache and it really makes me wish that reticulate worked more smoothly in the context of a package. Unfortunately, for the Shimadzu LCD parser, the python bindings are pretty necessary since as far as I know there is no equivalent to Best, |
Hi Ethan, the file is read in with olefile.OleFileIO !
In the first case, I just use the first 4 bytes of the PDA Raw Data stream which is repeated before each data block, and simply count how often it appears. In my case it was 3564 times. Then I screen my data file (mainly by eye) and found the number '3564' in the stream "PDA 3D Raw Data/3D Data Item" which makes perfectly sense. This is an XML-type of stream similar to the one from which your code extracts the start and endtime. Unfortunately, I can not read it without an error with my XML-parser, which I can easily for many other xml streams in the same data file. I simply made a workaround and use a string operation to get the number, but this will fail in case the number would have a different length. But from this you know where to find it. For the data of the fluorometer: these instruments are connected in an analog way to the main Shimadszu instrument. The first thing to know is at which channel it is connected to. Most likely its an early one, like in may case its Channel 1. When screening the streams of the data file there are several streams for a high number of channels, but looking on the size of each stream (many are empty) I found the data in "LSS Raw Data/Chromatogram Ch1"! I could not find any additional information in other Channel 1 streams, e.g. for the length of the data set, which is different from the PDA data set as the instruments works with a different frequency. Luckily, the data format of the stream and its decoding is the same than for the PDA data. I can use your block decoding scheme. The differences to the PDA are logical: its only a single data set (the time series of the fluorescence of one excitation/emission channel). While each data set of the PDA data for each PDA spectrum consists of two data block, the fluorescence data have much more data blocks (in my case 18). But here we do not need a fixed order when reading the data in. I simply use a loop over all the blocks til the end of the data stream. For the time axis I am assuming that the start and end times are the same than for the PDA. I have not find out how the scale of the values need to be adjusted. I am getting very large values of up to 10^6 in the peaks, so I divided by 10^6. I can directly compare the results with the data in the Shimadzu software and I am in the same range but about a factor 4 too low, while the setting of the instrument is at Gain 4, but its not exactly factor 4. However here are my python functions for this:
For the cutting of the binary string at position 5 when reading the data: Hope this is helpful. Rüdiger |
Thanks Rüdiger -- this looks great! I wonder if you'd be willing to share one or two test files from your instrument? I'm not sure I have an analog stream in any of the files I currently have access to. I'd definitely be interested to hear about what you find if you make any headway with the spectral libraries. |
Hi Ethan, Rüdiger |
Thanks! |
We are also working on parsing the .lcd file from Shimadzu LC-40. Using the above python code, we have extracted LSS Raw Data - Chromatogram ch1 data. Thanks for it! -Charu |
Personally I haven't really looked into these streams too much -- I was mostly interested in being able to extract the data from the DAD detector -- but I would curious to hear what you figure out. |
Here’s the file with all streams that I have got from OleFile module in Python. In case if anyone has any idea on how to decode it (the peak table stream - ['LSS Data Processing', 'PT-LC.1.1.AD.2.CH#1']), I’d appreciate the help. |
Dear Ethan,
I tried to use your code for reading the raw data of our Shimadzu HPLC, thanks for that code!
I am not a programmer and I am mainly in Python and not in R. Here are some results from our (mine and my colleaque using R) last days working on this, I wonder whether you would like to include the issues we found for your R code.
I needed to change mainly two things:
your line 147 in read_shimadzu_lcd.R, mat <- matrix(NA, nrow = fsize/(n_lambdas*1.5), ncol = n_lambdas)
This is about the size of the data stream which depends on the number of wavelength from the PDA and the total time of the HPLC run. A simple factor 1.5 does not work for my data. Instead, I first scan the PDA raw data stream for the start bits of each header of the data set and sum them up. Second, I now found the entry in a stream that contains the number of datasets and can simply be read out.
your line 249 in function decode_shimadzu_block: buffer[[2]] <- twos_complement(substr(bin, 5, nchar(bin))),
This line cuts off the first 4 bits of the bit string that finally contains the number of the difference to the former value. It worked this way for my PDA data, but could not reproduce the results of the fluoremeter at some positions and distorted the signal. I needed some time to understand this but at the end the funstion simply failed when the value for the difference is a large number and mpre bytes are needed to decode it. At the end I simple reduced the cut and are using the bits from position 3. This seemed to work!
My question here is: did you find the number '5' simply by trial and error, or was there a reason?
If there is interest from your side, I can spend some time to described more details, e.g. where to find the fluorescence data and how to read it or the file size in the .lcd file.
Best
Rüdiger
The text was updated successfully, but these errors were encountered: