curriculum vitæ | Lucas de Sousa Rosa

Basics

Name	Lucas de Sousa Rosa
Label	Computer Scientist
Email	roses.lucas@usp.br
Url	https://firilisinof.github.io/
Summary	Direct Ph.D. student in the Computer Science Graduate Program at the Institute of Mathematics and Statistics, University of São Paulo (IME-USP), with over four years of research experience consolidated through FAPESP fellowships in technical training, scientific initiation, research internships abroad and doctorate. Interests focused on mathematical-computational problems, with an emphasis on high-performance computing, machine learning, and scheduling.

Education

2023.08 - Present

São Paulo, Brazil
PhD

University of São Paulo, Institute of Mathemics and Statistics

Computer Science
2019.07 - 2023.08

São Paulo, Brazil
BSc

University of São Paulo

Molecular Sciences
2019.02 - 2019.07

São Carlos, Brazil
BSc

University of São Paulo, São Carlos Institute of Physics

Computational Physics

Awards

2023

One of the Best Scientific Initiation Work

Undergraduate Research Contest (CTIC), Brazilian Computer Society Congress (CSBC)

Work selected as one of the 10 best scientific initiation works of 2023 according to the Brazilian Computer Society (SBC).
2022

Honorable Mention

XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2022)

Honorable Mention for the article In search of efficient scheduling heuristics from simulations and Machine Learning accepted at the Workshop de Iniciação Científica em Arquitetura de Computadores e Computação de Alto Desempenho (WSCAD-WIC)

Publications

2024.05.16

Energy-Aware Scheduling for Serverless Scientific Workflows: A Machine Learning Approach

Proceedings of the 15th Regional School of High-Performance Computing of São Paulo

This article proposes to address the challenges of energy efficiency and workflow scheduling in serverless computing environments. By integrating machine learning techniques and simulation, the research aims to bridge gaps between energy efficiency and serverless scheduling. The methodology involves historical data collection, energy consumption prediction through machine learning, and the development of scheduling policies with deep neural networks. The project also includes adaptation of workflow management systems and validation in real-world environments, aiming to provide viable solutions to current challenges in HPC.
2023.09.15

An Experimental Analysis of Regression-Obtained HPC Scheduling Heuristics

26th Workshop on Job Scheduling Strategies for Parallel Processing

Scheduling jobs in High-Performance Computing (HPC) platforms typically involves heuristics consisting of job sorting functions such as First-Come-First-Served or custom (hand-engineered). Linear regression methods are promising for exploiting scheduling data to create simple and transparent heuristics with lesser computational overhead than state-of-the-art learning methods. The drawback is lesser scheduling performance. We experimentally investigated the hypothesis that we could increase the scheduling performance of regression-obtained heuristics by increasing the complexity of the sorting functions and exploiting derivative job features. We used multiple linear regression to develop a factory of scheduling heuristics based on scheduling data. This factory uses general polynomials of the jobs’ characteristics as templates for the scheduling heuristics. We defined a set of polynomials with increasing complexity between them, and we used our factory to create scheduling heuristics based on these polynomials. We evaluated the performance of the obtained heuristics with wide-range simulation experiments using real-world traces from 1997 to 2016. Our results show that large-sized polynomials led to unstable scheduling heuristics due to multicollinearity effects in the regression, with small-sized polynomials leading to a stable and efficient scheduling performance. These results conclude that (i) multicollinearity imposes a constraint when one wants to derive new features (i.e., feature engineering) for creating scheduling heuristics with regression, and (ii) regression-obtained scheduling heuristics can be resilient to the long-term evolution of HPC platforms and workloads.
2023.08.05

On limits of Machine Learning techniques in the learning of scheduling policies

Proceedings of the 42nd Undergraduate Research Contest (CTIC)

This scientific initiation work explores the emerging relationship between managing resources on high-performance computing (HPC) platforms and the use of regression-derived scheduling heuristics to optimize performance. Recent research has shown that machine learning (ML) techniques can be used to generate scheduling heuristics that are simple and efficient. This work proposes an alternative approach using polynomial functions to generate scheduling heuristics. The simplest polynomial was found to be one of the most efficient heuristic. We also evaluated the resilience of the regression-derived heuristics over time. We published two papers in peer-reviewed national and international workshops (Qualis-B3/B4).
2023.07.17

Exploring simplicity and efficiency: regression-based scheduling heuristics in HPC

Proceedings of the 14th Regional School of High-Performance Computing of São Paulo

This research examines the interplay between resource management in high-performance computing systems and the application of machine learning techniques in developing scheduling heuristics. The potential for improved performance, through scheduling heuristics based on linear regression and polynomial job characteristics, was explored. Larger polynomials caused instability due to multicollinearity effects, but the simplest polynomial delivered stable and efficient scheduling performance. The study also evaluates the longterm resilience of these regression-based heuristics.
2022.10.19

In search of efficient scheduling heuristics from simulations and Machine Learning

Companion Proceedings of the 23rd Symposium on High Performance Computing Systems

High Performance Computing (HPC) systems are used to solve a number of complex issues in different fields of knowledge. However, these platforms have been rapidly evolving in size and complexity; and ensuring efficiency in managing applications (jobs) has become a challenge. Typically, this management involves scheduling heuristics that consist of functions to order the jobs. In this work we evaluate the limits of regression methods for creating scheduling heuristics. Our results show that the simplest heuristic led to the most efficient scheduling, while the more complex heuristics showed instabilities due to multicollinearity.

Projects

2024.09 - Present
Doctorate (Direct PhD): Sustainable Supercomputing: Energy Efficiency and Resource Management through Statistical Modeling and Machine Learning

This PhD research project addresses challenges in resource management and workload modeling in the field of supercomputing. The proposed strategy includes the use of automated techniques, such as machine learning and optimization based on design of experiments, taking into account energy aspects of tasks and platforms. We will carry out scheduling experiments via simulations, aiming to generate data on scheduling behavior in different scenarios, evaluate performance and energy efficiency metrics, as well as compare various methodologies. The project also aims to investigate an underexplored aspect of workload modeling: the representation of tasks' energy consumption. Finally, we intend to test these more sophisticated models by evaluating their representativeness
- Principal Investigator was Alfredo Goldman vel Lejbman
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 23/09048-8
2023.02 - 2023.03
Research Internship Abroad Scholarship (BEPE-IC): Evaluating machine learning techniques and simulations to create efficient scheduling heuristics

Collaboration with the DATAMOVE research group to investigate regression methods to create simple and efficient scheduling heuristics
- Principal Investigator was Alfredo Goldman vel Lejbman
- Supervisor was Denis Trystram
- Co-supervisor was Danilo Carastan-Santos
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 22/14673-6
2022.07 - 2023.08
Scientific Initiation (IC): On limits of Machine Learning techniques in the learning of scheduling policies

Explore the emerging relationship between managing resources on high-performance computing (HPC) platforms and the use of regression-derived scheduling heuristics to optimize performance
- Principal Investigator was Alfredo Goldman vel Lejbman
- Co-supervisor was Danilo Carastan-Santos
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 22/06906-0
2020.08 - 2021.12
Technical Training (TT-1): Technical training in numerical methods and administration of computational resources

Training in the management of high-performance computing systems and introduction to numerical methods relevant to scientific computing, particularly in the area of molecular simulation
- Principal Investigator was Guilherme Menegon Arantes
- Financed by São Paulo Research Foundation (FAPESP)
- Grant Number is 20/09918-4
- Interactive teaching material for biophysics

Skills

	Computer Science
	High-Performance Computing
	Job Scheduling
	Parallel Computing
	Machine Learning
	Software Development
	Human Computer Interaction
	Game Development

	Programming Languages
	Python
	C/C++
	Julia
	R

Languages

	Brazilian Portuguese
	Native speaker

	English
	Fluent

	French
	Learning

Basics

Education

University of São Paulo, Institute of Mathemics and Statistics

Computer Science

University of São Paulo

Molecular Sciences

University of São Paulo, São Carlos Institute of Physics

Computational Physics

Awards

Undergraduate Research Contest (CTIC), Brazilian Computer Society Congress (CSBC)

Work selected as one of the 10 best scientific initiation works of 2023 according to the Brazilian Computer Society (SBC).

XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2022)

Honorable Mention for the article In search of efficient scheduling heuristics from simulations and Machine Learning accepted at the Workshop de Iniciação Científica em Arquitetura de Computadores e Computação de Alto Desempenho (WSCAD-WIC)

Publications

Proceedings of the 15th Regional School of High-Performance Computing of São Paulo

26th Workshop on Job Scheduling Strategies for Parallel Processing

Proceedings of the 42nd Undergraduate Research Contest (CTIC)

Proceedings of the 14th Regional School of High-Performance Computing of São Paulo

Companion Proceedings of the 23rd Symposium on High Performance Computing Systems

Projects

Collaboration with the DATAMOVE research group to investigate regression methods to create simple and efficient scheduling heuristics

Explore the emerging relationship between managing resources on high-performance computing (HPC) platforms and the use of regression-derived scheduling heuristics to optimize performance

Training in the management of high-performance computing systems and introduction to numerical methods relevant to scientific computing, particularly in the area of molecular simulation

Skills

Languages