Puneet S. Ludu

pludu@buffalo.edu | New York, NY | +1-(716) 867-4344

https://github.com/puneetsl | https://www.linkedin.com/in/puneetsl | https://www.kaggle.com/puneetsl


Education

Master of Science in Computer Science, State University of New York, Buffalo, NY 2014

B. Tech. in Computer Science, Jaypee Institute of information Technology, Noida, India 2010


Skills

Adept

Working Knowledge

Languages

Python

Java, Perl, C/C++, C#, Matlab, bash, javascript

Frameworks

Pyspark, Flask

Apache (Hadoop, Pig), Couchbase, AWS, Keras


Experience

~9 years

Machine Learning Engineer, Sep 2021 - Present, Zillow, Remote

~7 Months

    Automated home Valuation (Zestimate etc.) (Pyspark, Docker)

      Migrating ML model pipelines from legacy monolith architechture to container based scalable pipelines

Sr. Machine Learning Engineer, May 2020 - Sep 2021, OkCupid, New York City

~1.5 Year

    Price optimization (Python, Keras, TensorFlow, Weights and Biases) [Wide&Deep]

      Leading the efforts to optimize subscription pricing to optimize the revenue for OKCupid, Implemented end-to-end ML pipelines, feature engineering, modelling, alerting etc.

      Impact: Increased overall revenue by 6% over randomly assigned prices

Sr. Machine Learning Engineer, Apr 2015 - May 2020, FactSet Research Systems, New York City

~5 Years

    Speaker Identification (Python, Keras) [Spectrograms, CNN]

      Created a POC for End-to-End speaker identification system to find speakers from live audio during a companies quarterly earning calls.

      Impact: In early testing it was estimated to save around 20% human-hours

    Private company fact extraction (Python, Keras, Sagemaker, DataBricks) [ELMo, BiLSTM, Blazingtext]

      Lead the efforts to extract ‘full company name’ with key-people, their titles and biographies from 1.6 million crawled and cached websites of private companies.

    Duplicate Document Identification Service (Java, Couchbase) [Shingling, Vector Space Models]

      Developed full-stack solution to identify the duplicate documents in real time, given a stream of thousands of documents per day

      Impact: 66% reduction in compute time for document processing. Also, used by StreetAccount to find trending news.

    Type-Ahead and Query expansion (Apache Spark, Java, Python) [Distributed Trie, LogisticRegression]

      Lead developer for implementing features like Autocomplete Query(Type Ahead) and suggest similar concepts to expand the formulated query for a ‘Financial Document Search Engine’

    Formula Ranking (Apache Spark, Python) [N-gram Language models]

      Developed the pipeline to cluster users and rank the formulas in the feature of FactSet terminal

      Impact: Average rank brought down from 5.6(ElasticSearch based) to 2.3(Language Model based)

Research Engineer (Data Analytics), Dec 2010 - July 2013, TCS ResearchIndia

~2.5 Years

    Event Detection in Time Series (Java, Python, Rapidminer) [SVM - RBF]

      Wrote an algorithm based on Shape Context for finding frequently occurring patterns and events, with as good results as SAX, DTW etc. with 7% better results in the particular domain of car sensors.

    Data Harmonization Framework (DHF) (Java, Apache Pig)

      Implemented an ETL framework that exploits the power of map-reduce and big-databases to fuse incongruous enterprise data from disparate sources in near real time.

    Learning Management System (PHP, MySql)

      Part of the team which developed the ‘Trainee Evaluation and Learning management’ System.


Publications

Google Scholar profile

Inferring Latent Attributes of an Indian Twitter user using Celebrities and Class Influencers”, ACM Hypertext 2015 (ppt)

Inferring gender of a Twitter user using celebrities it follows”, CORR 2014

Architecture for Automated Tagging and Clustering of Song Files According to Mood”, IJCSI, 2010


Academic Projects

Quena- A Question and Answering system (Java, Apache Solr, Stanford NER, Stanford POS tagger, Apache Pig)
Indexed 1.6 Million Wikipedia documents, designed a question parser and a ranking algorithm based on popularity.

English to Hindi Translation and Transliteration (C, Gambas)

PSLFS – Virtual File system (C) [ICCC 2008.]


Personal Projects

Memodiction

BinaryDecimal

jTextBrew

An Android app that lets you learn and revise words to build your vocabulary intelligently

Python(Flask) based website to convert one base to another base

A JAVA library for fuzzy string matching, based on TextBrew algorithm by Chris Brew