top of page

Hi! I’m Arulnidhi Karunanidhi

1700077318436~2.jpg
Home: Welcome

About me

Having graduated with an MSc in Big Data Analytics from Sheffield Hallam University and an extensive hands-on experience in data engineering and analysis, where I have automated, scheduled, and created dependencies in pipelines. I am determined to create production pipelines using best practices around Security and Governance. I am actively seeking Data Engineering job role, and I have the right to work in the UK.

Organized Desk
Home: About Me

MY PROJECTS

What I’ve Done

Azure Data Factory and Databricks: Covid-19 Dataset

Leveraged Azure Data Factory for data ingestion from Blob and HTTP sources, utilizing features like copy activity, control flow activities (including If condition, Get Metadata, Web Activities), pipeline parameters, variables, and schedule triggers.

​Implemented complex data flows - Azure Data Factory and Azure Databricks transformations, also designed and automated pipelines (full/event/dependency within a parent pipeline) ensuring smooth data movement and transformation.

Utilized Azure SQL Database, Azure Blob Storage, ADLS Gen2 for data storage and big data solutions. Incorporated PowerBI for effective reporting and monitoring, including setting up alerts and log analytics to maintain the project's data integrity and accuracy.

Azure Synapse Analytics: Analysing and Reporting on NYC Taxi trips Data

Utilised compute engines including Serverless SQL pool, Spark Pool, Dedicated SQL Pool, and Synapse Data Flows. Leveraged these capabilities to create and query via Serverless SQL Pool and Spark Pool, achieving data discovery, virtualization (with external tables/views), and transformation (with schema/partitions stored in Parquet format). Created pipelines and triggers, enabling data ingestion and transformation

Visualized in Power BI, integrated with Synapse Studio for enhanced analytics. Utilized Synapse link for Cosmos DB to create an analytical store, querying data with spark pool and serverless SQL pool. Established a Dedicated SQL Pool to copy data from the data lake

Azure Databricks, Delta Lake, and Azure Data Factory: Formula1 Racing Data Analysis.

Built a data lake in Azure using ADLS Gen2, ingesting the Formula1 dataset from an external API. Utilized Azure Databricks as the primary engine, implementing PySpark transformations (filter/join) and aggregations (Group By, Window Functions, SparkSQL), facilitating effective data organization and transformation.

Implemented Azure Data Factory for scheduling and monitoring data pipeline activities. Additionally, applied the incremental load and Delta Lake for efficient management of both batch data, enhancing data update frequency and accuracy.

Employed Azure Key Vault for advanced security measures, safeguarding sensitive information and ensuring project's data integrity.

Retail Sales Analysis Using ADF and Databricks

Leveraged Azure Data Factory for seamless ingestion of diverse retail sales data from CSV files, databases, and REST APIs into Azure Blob Storage. Implemented Data Flows for comprehensive transformations, covering data cleansing, and aggregation. Established parameterized pipelines and triggers for automated ETL processes, ensuring timely data updates.

Utilized Databricks and PySpark for advanced data processing, executing intricate transformations for in-depth retail sales analysis. Calculated crucial metrics such as total revenue, average order value, and customer lifetime value. Efficiently managed data storage with Delta Lake.

Focused on query optimization in PySpark, employing strategies like caching, broadcast joins, and partitioning. Demonstrated significant improvements in query execution time, enhancing overall data processing efficiency.

E-commerce Product Recommendations

Implemented Azure Data Factory to ingest product and user interaction data from various sources into Azure Data Lake Storage Gen2. Designed and executed data pipelines for thorough data cleaning, preprocessing, and aggregation to build a robust recommendation system.

Utilized Databricks and PySpark for implementing collaborative filtering and content-based recommendation algorithms. Leveraged Delta Lake for efficient storage of recommendation models and intermediate data.

Addressed query optimization, ensuring real-time recommendations are generated efficiently. Evaluated the impact of different recommendation algorithms on query performance.

Home: Experience & Education

My Experience

Roles & Responsibilities

January 2022 - Present

Data Engineer

  • Gained hands-on experience on Azure Cloud Services, and on databricks. Worked on multiple projects and did relevant certifications.

July 2022 - Present

Recreational Assistant, EDU LETTINGS

  • Primarily worked in Sheffield Park and Springs Academy School. Facilitated Children engagement and supervised their activities within the school facilities, including the football ground and indoor sports hall. Collaborated effectively with teachers, ensuring the smooth flow of classroom activities, and aiding as needed.

  • Maintained a well-organized learning environment, ensuring that classrooms and activity areas were prepared and orderly. Engaged, with school staff, including teachers and the school estate team, to contribute to the overall functioning of the educational setting.

May 2018 - April 2019

Client Partner – Accounts Receivable, Access Healthcare

  • Performed modelling, and analysis on Power BI with the accounts receivable data and documented on the Athena client software with appropriate medical codes, reasons for underpayment and payments denial reasons.

  • Assisted and resolved client support inquiries, requests, and complaints through calling to ensure resolution at the first point of contact.

March 2014 - Feb 2017

Researcher- Data Collection, Tablytics Market Ltd

  • Worked for International Data Corporation (IDC), a global research and advisory firm. Collected data from various IT companies by taking surveys from the managers via phone call. Ensured, that the data collection process complied with all relevant data governance regulations, including HIPAA, UK GDPR, and CCPA.

  • Developed and implemented data security procedures to protect the confidentiality and integrity of the data. Successfully completed all data collection projects on time.

Home: Experience
Home: Education

Academics

Learning and Living

January 2022 - March 2023

Msc Big Data Analytics

  • Gained Knowledge on SAS programming, Big Data Processing, And visualizing tools like Tableau.

  • Gained knowledge on Project management methodoligies such as Prince2, and Agile .

  • Dissertation on How Sentiment Analysis affect the Global ‘Top 50’ 'NFT artists - Utilising 'R progamming and PowerBI'

September 2009 - September 2013

B.E  Computer science, SRR Engineering College, India

  • Received solid foundation on DATA STRUCTURES and ALGORITHMS. 

  • Programming languages such as python, and sql.

Skills

Building and maintaining data pipelines

Data Quality checks, and Data Modeling

Performance Optimization

Visualising the Data in PowerBI/Tableau

Python/Pyspark, and SQL

Home: Skills
Home: Pro Gallery

"If you want something you've never had, you must be willing to do something you've never done"

Thomas Jefferson

White File Folders
Home: Quote

Let’s Connect

+447442327815

  • LinkedIn

Thanks for submitting!

Home: Contact
Home: Portfolio
Home: Blog Post Gallery
bottom of page