Shivam Rajput

Data Engineer

LinkedIn | GitHub

About

Highly analytical Data Engineer with a Bachelor of Technology from IIT (BHU) Varanasi, specializing in building and optimizing scalable data pipelines and distributed data lake architectures. Proven ability to leverage AWS services, Apache technologies, and advanced programming to achieve significant efficiency gains, including a 30% reduction in data migration time. Recognized for strong problem-solving skills, evidenced by a top 0.5% rank in JEE Advanced and a 2nd place win in the Gen AI Hackathon for an AI-powered staffing recommendation system.

Work Experience

Data Engineer

Accordion

Jun 2024 - May 2025

Bengaluru, Karnataka, IN

Engineered and maintained robust ETL pipelines utilizing AWS services to ensure seamless data ingestion, transformation, and loading processes.

  • Developed a Python script leveraging boto3 and concurrent.futures to optimize data migration between Amazon S3 buckets, achieving a 30% reduction in transfer time.
  • Created and optimized stored procedures in Amazon Redshift for complex data operations, significantly enhancing performance and scalability.
  • Maintained robust ETL pipelines using AWS Glue, Amazon Redshift, and Amazon S3, ensuring seamless data ingestion, transformation, and loading processes.

Data Engineer

Physics Wallah

May 2025 - Jul 2024

Bengaluru, Karnataka, IN

Currently leads the development and optimization of robust data pipelines and data lake architecture for scalable and performant analytics.

  • Built and scheduled data pipelines using Apache Airflow to ingest data from Google Sheets, REST APIs, and MongoDB into Trino tables, ensuring reliable data availability.
  • Implemented Debezium and Kafka for real-time change data capture from MongoDB collections, centralizing data into the core platform.
  • Contributed to an in-house data architecture leveraging Apache Iceberg, Amazon S3, and Trino, optimizing query performance for scalable and performant analytics.
  • Supported data transformation workflows using Apache Spark for efficient batch processing within a distributed data lake environment.

Education

Technology

Indian Institute of Technology (BHU), Varanasi

Nov 2020 - May 2024

Varanasi, Uttar Pradesh, IN

Projects

Staffing Assistant (Gen AI Hackathon Project)

Nov 2024 - Nov 2024

Developed an AI-powered staffing recommendation system leveraging NLP and embedding-based similarity search, securing 2nd place in the Gen AI Hackathon.

Product Management System

Aug 2024 - Sep 2024

Developed a microservices-based system for product management and user interactions, focusing on efficient, decoupled services.

Awards

2nd Rank, Accordion Gen AI Hackathon 2024

Accordion

Nov 2024

Awarded for developing an innovative AI-powered staffing recommendation system that leveraged NLP and embedding-based similarity search.

Codeforces Expert (Max rating 1727, Global Rank 675)

Codeforces

Mar 2024

Achieved Codeforces Expert status with a max rating of 1727 and a Global Rank of 675 in Codeforces Round 927, solving over 350+ Data Structures & Algorithms problems, showcasing advanced problem-solving and algorithmic skills.

Top 0.5% Rank, JEE ADVANCED 2020

JEE ADVANCED

Jan 2020

Achieved a top 0.5% ranking among over 1 Million candidates in the highly competitive JEE ADVANCED 2020 examination, demonstrating exceptional aptitude in science and engineering.

Languages

English

Skills

Programming Languages

  • Python
  • SQL
  • C++

Big Data & Data Engineering

  • Apache Airflow
  • Apache Spark
  • Kafka
  • Apache Iceberg
  • Debezium
  • Trino
  • ETL
  • Data Pipelines
  • Data Lake
  • Distributed Systems
  • Data Warehousing

Cloud Platforms & Databases

  • AWS Glue
  • Amazon Redshift
  • Amazon S3
  • boto3
  • MySQL
  • FAISS

Web Frameworks & DevOps

  • Django
  • Flask
  • Streamlit
  • Docker
  • RabbitMQ

Artificial Intelligence & Machine Learning

  • NLP
  • Embedding-based Similarity Search
  • AI-powered Recommendation Systems