Save the raw data in a different format

Binder

For some applications, saving the raw data in a different file format can be very useful. GT3X is a file format optimized for size at the trade-off for loading time. This means, while being very small in size, loading a GT3X file takes a considerable amount of time. Pandas DataFrames allow to export the data to a variety of file formats. In the following, we show a small comparison of the most common file types (.csv, .h5, .pkl) to illustrate the respective advantages and disadvantages.

10min recording

The first comparison was done with the 10min recording file in data/10min_recording.gt3x. While CSV files have the advantage to be human-readble and that they can be easily imported into for example Excel, they take a lot of space while still reading takes some time. The Hierarchical Data Format (HDF or simply .h5) and Python’s own binary format Pickle (.pkl), on the other hand, are much quicker to load, but still take a lot more space than the GT3X file.

File type

Size

Loading time

.gt3x

~277kb

~56.28ms

.csv

~3296kb

~27.50ms

.h5

~1927kb

~4.01ms

.pkl

~1920kb

~0.88ms

7 day recording

When dealing with small GT3X files and short recording times, this differences are often not that important. However, when you have to analyze multi-day recordings, the difference can have a bigger impact. In the following some benchmarks from a seven day recording are shown which resulted in a ~277MB big GT3X file. Loading this file with PAAT takes a noticeable amount of time (approx. 50 seconds). The same data in a HDF5 file is loaded in less then a second or in around a second when stored as a Pickle. The CSV again loads quicker than the GT3X, but also takes more than 3GB of storage.

File type

Size

Loading time

.gt3x

~277mb

~49.59s

.csv

~3367mb

~26.32s

.h5

~1935mb

~.69s

.pkl

~1935mb

~1.12s

Summary and Conclusion

It is not always the best to only work with GT3X files. Specially when you have longer recording times, loading a GT3X file can take a noticeable amount of time. This is fine if you only need to process the files once. However, if you are know that you will touch the same files multiple times, exporting them as a HDF5 file or a Pickle can be advantageous. CSV export, on the other hand, is only beneficial if you want to use the data in a different application where CSV is required (e.g. Excel). Many other applications have also the option to import data from HDF5 files, like R or Matlab.