About me
Having graduated with an MSc in Big Data Analytics from Sheffield Hallam University and an extensive hands-on experience in data engineering and analysis, where I have automated, scheduled, and created dependencies in pipelines. I am determined to create production pipelines using best practices around Security and Governance. I am actively seeking Data Engineering job role, and I have the right to work in the UK.
MY PROJECTS
What I’ve Done
Azure Data Factory and Databricks: Covid-19 Dataset
Leveraged Azure Data Factory for data ingestion from Blob and HTTP sources, utilizing features like copy activity, control flow activities (including If condition, Get Metadata, Web Activities), pipeline parameters, variables, and schedule triggers.
Implemented complex data flows - Azure Data Factory and Azure Databricks transformations, also designed and automated pipelines (full/event/dependency within a parent pipeline) ensuring smooth data movement and transformation.
Utilized Azure SQL Database, Azure Blob Storage, ADLS Gen2 for data storage and big data solutions. Incorporated PowerBI for effective reporting and monitoring, including setting up alerts and log analytics to maintain the project's data integrity and accuracy.
Azure Synapse Analytics: Analysing and Reporting on NYC Taxi trips Data
Utilised compute engines including Serverless SQL pool, Spark Pool, Dedicated SQL Pool, and Synapse Data Flows. Leveraged these capabilities to create and query via Serverless SQL Pool and Spark Pool, achieving data discovery, virtualization (with external tables/views), and transformation (with schema/partitions stored in Parquet format). Created pipelines and triggers, enabling data ingestion and transformation
Visualized in Power BI, integrated with Synapse Studio for enhanced analytics. Utilized Synapse link for Cosmos DB to create an analytical store, querying data with spark pool and serverless SQL pool. Established a Dedicated SQL Pool to copy data from the data lake
Azure Databricks, Delta Lake, and Azure Data Factory: Formula1 Racing Data Analysis.
Built a data lake in Azure using ADLS Gen2, ingesting the Formula1 dataset from an external API. Utilized Azure Databricks as the primary engine, implementing PySpark transformations (filter/join) and aggregations (Group By, Window Functions, SparkSQL), facilitating effective data organization and transformation.
Implemented Azure Data Factory for scheduling and monitoring data pipeline activities. Additionally, applied the incremental load and Delta Lake for efficient management of both batch data, enhancing data update frequency and accuracy.
Employed Azure Key Vault for advanced security measures, safeguarding sensitive information and ensuring project's data integrity.
Retail Sales Analysis Using ADF and Databricks
Leveraged Azure Data Factory for seamless ingestion of diverse retail sales data from CSV files, databases, and REST APIs into Azure Blob Storage. Implemented Data Flows for comprehensive transformations, covering data cleansing, and aggregation. Established parameterized pipelines and triggers for automated ETL processes, ensuring timely data updates.
Utilized Databricks and PySpark for advanced data processing, executing intricate transformations for in-depth retail sales analysis. Calculated crucial metrics such as total revenue, average order value, and customer lifetime value. Efficiently managed data storage with Delta Lake.
Focused on query optimization in PySpark, employing strategies like caching, broadcast joins, and partitioning. Demonstrated significant improvements in query execution time, enhancing overall data processing efficiency.
E-commerce Product Recommendations
Implemented Azure Data Factory to ingest product and user interaction data from various sources into Azure Data Lake Storage Gen2. Designed and executed data pipelines for thorough data cleaning, preprocessing, and aggregation to build a robust recommendation system.
Utilized Databricks and PySpark for implementing collaborative filtering and content-based recommendation algorithms. Leveraged Delta Lake for efficient storage of recommendation models and intermediate data.
Addressed query optimization, ensuring real-time recommendations are generated efficiently. Evaluated the impact of different recommendation algorithms on query performance.
My Experience
Roles & Responsibilities
January 2022 - Present
Data Engineer
Gained hands-on experience on Azure Cloud Services, and on databricks. Worked on multiple projects and did relevant certifications.
July 2022 - Present
Recreational Assistant, EDU LETTINGS
Primarily worked in Sheffield Park and Springs Academy School. Facilitated Children engagement and supervised their activities within the school facilities, including the football ground and indoor sports hall. Collaborated effectively with teachers, ensuring the smooth flow of classroom activities, and aiding as needed.
Maintained a well-organized learning environment, ensuring that classrooms and activity areas were prepared and orderly. Engaged, with school staff, including teachers and the school estate team, to contribute to the overall functioning of the educational setting.
May 2018 - April 2019
Client Partner – Accounts Receivable, Access Healthcare
Performed modelling, and analysis on Power BI with the accounts receivable data and documented on the Athena client software with appropriate medical codes, reasons for underpayment and payments denial reasons.
Assisted and resolved client support inquiries, requests, and complaints through calling to ensure resolution at the first point of contact.
March 2014 - Feb 2017
Researcher- Data Collection, Tablytics Market Ltd
Worked for International Data Corporation (IDC), a global research and advisory firm. Collected data from various IT companies by taking surveys from the managers via phone call. Ensured, that the data collection process complied with all relevant data governance regulations, including HIPAA, UK GDPR, and CCPA.
Developed and implemented data security procedures to protect the confidentiality and integrity of the data. Successfully completed all data collection projects on time.
Academics
Learning and Living
January 2022 - March 2023
Msc Big Data Analytics
Gained Knowledge on SAS programming, Big Data Processing, And visualizing tools like Tableau.
Gained knowledge on Project management methodoligies such as Prince2, and Agile .
Dissertation on How Sentiment Analysis affect the Global ‘Top 50’ 'NFT artists - Utilising 'R progamming and PowerBI'
September 2009 - September 2013
B.E Computer science, SRR Engineering College, India
Received solid foundation on DATA STRUCTURES and ALGORITHMS.
Programming languages such as python, and sql.
Skills
Building and maintaining data pipelines
Data Quality checks, and Data Modeling
Performance Optimization
Visualising the Data in PowerBI/Tableau
Python/Pyspark, and SQL
"If you want something you've never had, you must be willing to do something you've never done"