Hi! I'm David Lu

I'm a Data Scientist

Hi there! I'm David, a 4th-year Computer Science and Statistics student at the University of California, Irvine. Alongside my studies, I've been working as a Data Scientist at EDF Innovation Lab for over a year, focused on building predictive models for electricity demand using climate data. I'm also part of Deep Data Lab where I conduct research on Normalized Metered Energy Consumption (NMEC), and Baldi Lab where I research on Antenna Array Optimization using Deep Learning. Some of my past experiences include being a Software and Data Engineer Intern at Lyrid (1 year) and a Data Scientist Intern at Yes! Star (1 year). Please feel free to browse through my website where you can learn more about my experiences, research, and projects.

What I'm looking for?

I'm currently looking for a full-time job, interested in Energy Trading, Data Science and Engineering, or Software Engineering! If my experiences align with the needs of your company, or if you are interested in chatting with me, please reach out to me at david.linyi.lu@gmail.com or connect with me on LinkedIn!

ABOUT

My Education & Experience

Education & ML-Stack

B.S Computer Science, Statistics

University of California, Irvine (2020-2024)

ML-Stack

• Python

• PyTorch

• Tensorflow

• Pandas

• Numpy

• Xarray

• Dask

• Apache Airflow

• Apache Spark

Experience & Research

ML Researcher

Baldi Lab (Feb 2023 - Present)

Researcher

Deep Data Lab (Sept 2023 - Present)

Data Scientist

EDF Innovation Lab (Sept 2022 - Jan 2024)

Software and Data Engineer Intern

Lyrid (Dec 2021 - Sept 2022)

Data Scientist Intern

Yes! Star Corporation (Jun 2019 - Jul 2020)

RESEARCH

My Recent Research

Normalized-Metered Energy Consumption

Deep Data Lab

Imagine if we could peek into a crystal ball and see the hidden stories of energy in our homes. That's where my research on Normalized Metered Energy Consumption (NMEC) comes into play—it's like a time machine for electricity usage.

NMEC isn't just a fancy acronym; it's the heartbeat of understanding how much energy we could save if we lived a little differently. Think of it as a behind-the-scenes look at the 'what-ifs' of our daily power habits.

The project is still ongoing with Prof. Matthew C. Harding, although you can preview some of the progress on my GitHub!

Code

Antenna Array Optimization using Deep Learning

Baldi Lab

Think of antenna arrays as a team of antennas working in unison to send and receive signals with crystal-clear precision. However, when you're dealing with hundreds of them, it's like trying to align a choir of voices without a conductor; they can easily start to interfere with each other instead of working together.

By leveraging PyTorch, I've engineered deep learning models that can tackle this complexity with ease. These models act as the ultimate conductors for the modern digital orchestra, expertly tuning arrays to improve, and ensuring they perform in harmony!

The current research is ongoing, although you can preview a draft of the paper and the code below!

Code Draft
PROJECTS

My Recent & Upcoming Projects

Predicting Air Quality using Graphical Models

The battle with air pollution is a complex puzzle of ever-changing patterns. To forecast the ebb and flow of this urban haze, I turn to the analytical precision of graphical models. Bayesian Networks (BN) and Markov Random Fields (MRF) are my chosen tools, adept at revealing the hidden dependencies and interactions within pollution data.

With BNs, we begin by decoding the correlations in pollution levels, identifying how different areas influence one another. We then sharpen our focus with MRFs, examining the intricate web of interactions between these regions.

Feel free to take a look at the code and analysis where I used PGMpy and Numpy, as well as a draft of the paper!

Code Draft

Low-Density Parity-Check Code

In the digital symphony of data transmission, each bit is a note. Low-Density Parity-Check (LDPC) codes ensure that every note reaches its destination without distortion, preserving the integrity of the entire melody. When corruption is bound to happen in the transmission of data, LPDC helps correct the mistakes!

Imagine receiving a photo from a friend, only to find it marred by corrupted pixels. LDPC codes come to the rescue, seamlessly restoring the image to its pristine state. In my project, using PGMpy, I crafted factor graphs and applied the sum-product message passing algorithm, enabling these codes to identify and fix errors with remarkable precision, ensuring the photo you see is as perfect as intended.

The backend of this project is complete, but I am planning on adding a blog and a UI component for you try it out!

Coming Soon!

Weather-Driven Electricity Pricing Tool

This project presents a pragmatic approach to decoding the interplay between weather conditions and energy economics. By integrating real-time weather and grid data, we aim to forecast electricity prices with enhanced accuracy. The backbone of this system is a robust workflow managed by Airflow, complemented by the parallel processing prowess of Dask. PyTorch's deep learning capabilities are at the core of our predictive model, refining our forecasts to adapt to the volatile nature of energy markets.

This project is still in its early stages, but I plan on releasing the code soon and a demo!

Coming Soon!
CONTACT

Interested in my experience? Let's connect!