Data Scientist | Palo Alto Networks
Sept. 2020 - Present: PAN-DB Data Science Team — Machine learning for phishing URL detection.
Data Scientist @ Palo Alto Networks
CTO @ giveduet.org
USC Computer Science | Class of 2020
I am a recent graduate of the University of Southern California, where I earned a BS in Computer Science and an MS in E.E. While at USC, I worked at USC's Center for AI in Society (CAIS), and founded the Center for AI in Society's Student Branch (CAIS++). I currently work full-time as a Data Scientist at Palo Alto Networks, and lead the engineering efforts at Duet. My interests include machine learning, health & nutrition science, and the use of AI for social good.
Sept. 2020 - Present: PAN-DB Data Science Team — Machine learning for phishing URL detection.
Jan. 2019 - Present: Duet is a tech nonprofit that enables
donors to provide meaningful aid to vulnerable populations in a more dignified,
efficient, and personal way.
More info at giveduet.org.
Fall 2019 - Spring 2020: Trained deep learning models to predict high-res land-cover labels from low-res satellite imagery (Arxiv). Won 1st place in the 2020 IEEE GRSS Land-Cover Competition.
Summer 2019: Implemented a production-ready, deep learning-based NLP model that detects and categorizes sensitive and/or confidential files (e.g. source code, financial records). Achieved over 99% accuracy on a test set of 5,000 documents.
Summer 2018: Built a semi-supervised convolutional autoencoder for image feature extraction, allowing images to be incorporated into Falkonry’s event-detection pipeline. Developed ready-to-use Jupyter notebooks for performing exploratory data analysis on time-series data.
Spring 2017 - Spring 2019: Empowering USC's top undergraduate ML talent to work on
AI-for-social-good projects that truly matter.
More info at caisplusplus.usc.edu.
Spring 2018 - Spring 2019: Developed an ML-based diagnostic tool for Kawasaki Disease, a rare (and often
undiagnosed) heart disease that affects children all over the world.
GitHub repository link: here.
Fall 2017 - Fall 2018: Published a codebase for evaluating representation learning models
for link prediction within online social networks. Over 200 stars on GitHub.
GitHub repository link: here.
Spring 2018: Trained Mask-RCNN and U-Net models to detect and segment cell nuclei from microscopic cell images. Placed in Top 10% worldwide in Kaggle's 2018 Data Science Bowl.
Competition page here.
Summer 2017: Analyzed web-crawler data to create a recommender system for suggesting content delivery
network (CDN) services to various websites with an AUROC of 0.89.
Research report here.
Looking to get in touch? Feel free to email me directly at hello@lucashu.me.