curriculum vitæ
Basics
Name | Lucas de Sousa Rosa |
Label | Computer Scientist |
roses.lucas@usp.br | |
Url | https://firilisinof.github.io/ |
Summary | Direct Ph.D. student in the Computer Science Graduate Program at the Institute of Mathematics and Statistics, University of São Paulo (IME-USP), with over four years of research experience consolidated through FAPESP fellowships in technical training, scientific initiation, research internships abroad and doctorate. Interests focused on mathematical-computational problems, with an emphasis on high-performance computing, machine learning, and scheduling. |
Education
-
2023.08 - Present São Paulo, Brazil
-
2019.07 - 2023.08 São Paulo, Brazil
-
2019.02 - 2019.07 São Carlos, Brazil
Awards
- 2023
One of the Best Scientific Initiation Work
Undergraduate Research Contest (CTIC), Brazilian Computer Society Congress (CSBC)
Work selected as one of the 10 best scientific initiation works of 2023 according to the Brazilian Computer Society (SBC).
- 2022
Honorable Mention
XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2022)
Honorable Mention for the article In search of efficient scheduling heuristics from simulations and Machine Learning accepted at the Workshop de Iniciação Científica em Arquitetura de Computadores e Computação de Alto Desempenho (WSCAD-WIC)
Publications
-
2024.05.16 Energy-Aware Scheduling for Serverless Scientific Workflows: A Machine Learning Approach
Proceedings of the 15th Regional School of High-Performance Computing of São Paulo
This article proposes to address the challenges of energy efficiency and workflow scheduling in serverless computing environments. By integrating machine learning techniques and simulation, the research aims to bridge gaps between energy efficiency and serverless scheduling. The methodology involves historical data collection, energy consumption prediction through machine learning, and the development of scheduling policies with deep neural networks. The project also includes adaptation of workflow management systems and validation in real-world environments, aiming to provide viable solutions to current challenges in HPC.
-
2023.09.15 An Experimental Analysis of Regression-Obtained HPC Scheduling Heuristics
26th Workshop on Job Scheduling Strategies for Parallel Processing
Scheduling jobs in High-Performance Computing (HPC) platforms typically involves heuristics consisting of job sorting functions such as First-Come-First-Served or custom (hand-engineered). Linear regression methods are promising for exploiting scheduling data to create simple and transparent heuristics with lesser computational overhead than state-of-the-art learning methods. The drawback is lesser scheduling performance. We experimentally investigated the hypothesis that we could increase the scheduling performance of regression-obtained heuristics by increasing the complexity of the sorting functions and exploiting derivative job features. We used multiple linear regression to develop a factory of scheduling heuristics based on scheduling data. This factory uses general polynomials of the jobs’ characteristics as templates for the scheduling heuristics. We defined a set of polynomials with increasing complexity between them, and we used our factory to create scheduling heuristics based on these polynomials. We evaluated the performance of the obtained heuristics with wide-range simulation experiments using real-world traces from 1997 to 2016. Our results show that large-sized polynomials led to unstable scheduling heuristics due to multicollinearity effects in the regression, with small-sized polynomials leading to a stable and efficient scheduling performance. These results conclude that (i) multicollinearity imposes a constraint when one wants to derive new features (i.e., feature engineering) for creating scheduling heuristics with regression, and (ii) regression-obtained scheduling heuristics can be resilient to the long-term evolution of HPC platforms and workloads.
-
2023.08.05 On limits of Machine Learning techniques in the learning of scheduling policies
Proceedings of the 42nd Undergraduate Research Contest (CTIC)
This scientific initiation work explores the emerging relationship between managing resources on high-performance computing (HPC) platforms and the use of regression-derived scheduling heuristics to optimize performance. Recent research has shown that machine learning (ML) techniques can be used to generate scheduling heuristics that are simple and efficient. This work proposes an alternative approach using polynomial functions to generate scheduling heuristics. The simplest polynomial was found to be one of the most efficient heuristic. We also evaluated the resilience of the regression-derived heuristics over time. We published two papers in peer-reviewed national and international workshops (Qualis-B3/B4).
-
2023.07.17 Exploring simplicity and efficiency: regression-based scheduling heuristics in HPC
Proceedings of the 14th Regional School of High-Performance Computing of São Paulo
This research examines the interplay between resource management in high-performance computing systems and the application of machine learning techniques in developing scheduling heuristics. The potential for improved performance, through scheduling heuristics based on linear regression and polynomial job characteristics, was explored. Larger polynomials caused instability due to multicollinearity effects, but the simplest polynomial delivered stable and efficient scheduling performance. The study also evaluates the longterm resilience of these regression-based heuristics.
-
2022.10.19 In search of efficient scheduling heuristics from simulations and Machine Learning
Companion Proceedings of the 23rd Symposium on High Performance Computing Systems
High Performance Computing (HPC) systems are used to solve a number of complex issues in different fields of knowledge. However, these platforms have been rapidly evolving in size and complexity; and ensuring efficiency in managing applications (jobs) has become a challenge. Typically, this management involves scheduling heuristics that consist of functions to order the jobs. In this work we evaluate the limits of regression methods for creating scheduling heuristics. Our results show that the simplest heuristic led to the most efficient scheduling, while the more complex heuristics showed instabilities due to multicollinearity.
Projects
- 2024.09 - Present
Doctorate (Direct PhD): Sustainable Supercomputing: Energy Efficiency and Resource Management through Statistical Modeling and Machine Learning
This PhD research project addresses challenges in resource management and workload modeling in the field of supercomputing. The proposed strategy includes the use of automated techniques, such as machine learning and optimization based on design of experiments, taking into account energy aspects of tasks and platforms. We will carry out scheduling experiments via simulations, aiming to generate data on scheduling behavior in different scenarios, evaluate performance and energy efficiency metrics, as well as compare various methodologies. The project also aims to investigate an underexplored aspect of workload modeling: the representation of tasks' energy consumption. Finally, we intend to test these more sophisticated models by evaluating their representativeness
- Principal Investigator was Alfredo Goldman vel Lejbman
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 23/09048-8
- 2023.02 - 2023.03
Research Internship Abroad Scholarship (BEPE-IC): Evaluating machine learning techniques and simulations to create efficient scheduling heuristics
Collaboration with the DATAMOVE research group to investigate regression methods to create simple and efficient scheduling heuristics
- Principal Investigator was Alfredo Goldman vel Lejbman
- Supervisor was Denis Trystram
- Co-supervisor was Danilo Carastan-Santos
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 22/14673-6
- 2022.07 - 2023.08
Scientific Initiation (IC): On limits of Machine Learning techniques in the learning of scheduling policies
Explore the emerging relationship between managing resources on high-performance computing (HPC) platforms and the use of regression-derived scheduling heuristics to optimize performance
- Principal Investigator was Alfredo Goldman vel Lejbman
- Co-supervisor was Danilo Carastan-Santos
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 22/06906-0
- 2020.08 - 2021.12
Technical Training (TT-1): Technical training in numerical methods and administration of computational resources
Training in the management of high-performance computing systems and introduction to numerical methods relevant to scientific computing, particularly in the area of molecular simulation
- Principal Investigator was Guilherme Menegon Arantes
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 20/09918-4
- Interactive teaching material for biophysics
Skills
Computer Science | |
High-Performance Computing | |
Job Scheduling | |
Parallel Computing | |
Machine Learning | |
Software Development | |
Human Computer Interaction | |
Game Development |
Programming Languages | |
Python | |
C/C++ | |
Julia | |
R |
Languages
Brazilian Portuguese | |
Native speaker |
English | |
Fluent |
French | |
Learning |