Stofa 257 - Langholt
Master's student: Sigurður Páll Behrend
Title: Design, implementation, and optimization of an advanced I/O Framework for Parallel Support Vector Machines.
Faculty: Faculty of Industrial Engineering, Mechanical Engineering and Computer Science.
Advisors: Morris Riedel, adjunct associated professor at the Faculty of Industrial Engineering, Mechanical Engineering and Computer Science and dr. Helmut Neukirchen, professor at the Faculty of Industrial Engineering, Mechanical Engineering and Computer Science.
Examiner: Dr. Lars Hoffmann, scientist at the Climate Science Simulation Laboratory, Juelich Supercomputing Centre, Germany.
The thesis goal is to improve the I/O performance of the PiSVM suite of paralleland scalable tools used for machine learning on HPC platforms. This is achievedby analyzing the current state of I/O and then designing an I/O framework thatenables PiSVM programs to read and write data in parallel, using HDF5 libraryand its associated file format. HDF5 is a highly scalable file format that is notwidely used in HPC or HTC yet. The thesis implements the design into the PiSVMtoolset as a proof-of-concept. A parser will be added to the PiSVM suite thatconverts data from the currently used SVMLight format into HDF5 format. A3.45% overall reduction in execution time was achieved in PiSVM-Train. A 4.88%overall reduction in execution time was achieved in PiSVM-Predict. Read and writetimes were improved by a bigger percentage, upward to 98% reduction in read andwrite times. This can be attributed to the design of the I/O framework and usageof advanced data storage features that HDF5 offers. A further significant result isa reduction of data file size by 72% and a reduction of model file size by 24%. Inpractice, any work with PiSVM will gain significant benefits from the work done inthis thesis. Whole research groups tend to have multiple copies of the data, workingwith different feature engineering techniques and As PiSVM is used in differentsupercomputing centers and by multiple research groups, the gains are significant.