My Journey from Physics into Data Science
Updated: Jan 14, 2020
I still learn new knowledge everyday with my growing passion in Data Science field. To pursue different career track as a graduating physics student there must be ‘Why’ and ‘How’ questions to be answered.
Having been asked by a number of people about my transition from academia — Physics to Data Science, I hope my story could answer the questions on why I decided to become a Data Scientist and how I pursued the goal, and ultimately encourage as well as inspire more people to pursue their passion. Let’s get started!
It all began from the summer studentship at CERN
The CERN Summer Student Programme offers once-in-a-lifetime opportunity for undergraduate students of physics, computing and engineering to join one of their research projects with top scientists in multicultural teams at CERN in Geneva, Switzerland.
In June 2017, I was very fortunate to be accepted to join the programme. I literally burst with joy as particle physics have always been my research interest and being able to conduct the research at CERN was simply a dream-come-true-experience for me! During the 2 months internship period, I did some analysis and simulation on the event reconstruction of terabytes of data via Worldwide LHC Computing Grid & Cloud Computing for Compact Muon Solenoid (CMS) Experiment.
Besides, summer students also attended a series of lectures, workshops and visits to CERN facilities that covered a wide range of topics in the fields of theoretical and experimental particle physics and computing.
During this period, I was introduced to Machine Learning and big data analytics by the lectures, workshops, and even my project itself. I was particularly mind blown by how these Machine Learning techniques could be used to classify and detect various microscopic particles to an extraordinary precision with such a huge amount of data. Baffled, I took a deep dive into Machine Learning and cloud computing topics without hesitation, simply because I loved it!
Who on Earth would have known that this exposure would become a tipping point in my life. And yes, I found my marriage with DATA.
However, despite my hunger to learn these topics, I still had vague ideas of what Data Science was. As vague as it sounded, I knew that I had to find out more upon the discovery of my true passion.
In-depth Research on Data Science Field
Once I was back to Singapore from my internship, I did some research to understand more about Data Science and to my surprise, there was not a well-defined definition of this field. But in general, Data Science could be summarized as the combination of programming skills, mathematics and statistics knowledge, and domain knowledge. The explanation here is by no means exhaustive but to shed some light on the definition in general (Any comments on this are welcomed! 😄).
Still, I was amazed by how data could be used to generate insights and drive business values for companies. From understanding a business problem, to collecting and visualizing data, until the stage of prototyping, fine-tuning and deploying models to real world applications, I found the fulfilment of tackling challenges to solve complex problems using data. Gradually, my passion began to take form…
“ Without data you’re just another person with an opinion ” — W. Edwards Deming
My Starting Point — Data Visualization
In August 2017, as the first step towards Data Science, I joined NIC Face-Off Data competition co-organized by Tableau and Infocomm Media Development Authority (IMDA) as my first exposure to data visualization.
This experience gave me the opportunity to use Tableau Public to visualize various open data sources which investigated the origins of haze in Southeast Asia to deliver actionable insights. I am very excited to share the simple Tableau dashboard with you (Feel free to leave your comments below!).
My first part-time Data Analytics Internship with SMRT
During the same month, I stumbled on an opportunity to work as a part-time data analytics intern at mobilityX — a SMRT seed-funded start-up. I used Python for coding mainly due to its high-level programming language, readability and support by wide community.
To be honest, I had really thought of giving up on coding when I first started learning programming during my first year in college. The struggle to run a simple for-loop could leave me hanging for a few days (even weeks!). Even worse, the negative thinking that “I simply have no talents” struck me with a heavy blow…
My interest in programming was not until I embarked on a research project with a professor in my faculty that required the development of data analysis tools during my third year studies. As you may have expected, I began to pick up Python to build the tools and I just fell in love with it!
Gone are the days when I told myself — “I simple have no talents” and replaced by the following steps to learn programming (at least for me):
Understand the fundamental logics of programming
Choose a programming language and learn how to use it (syntax etc.)
Practice, practice, practice
Repeat step 1–3
Apologies for the sidetracking as I was too excited to share my learning path with you at the point of writing…
Well, the part-time internship lasted until March 2018 and the learning journey had been fruitful. I learned and performed data cleaning and manipulation, web scraping, and data extraction using PostgreSQL with Python.
I graduated one semester earlier to do a Data Science internship
All the previous experiences had further reinforced my passion and established a foundation towards Data Science. Determined, I planned my studies timetable and managed to graduate earlier to pursue my current full-time Data Science internship at Quantum Inventions in December 2017.
At this stage, you may ask — Why did I go for an internship instead of a full-time Data Science position? Short answer is — To obtain more technical exposure and to experience the full cycle of Data Science flow from scratch by dealing with real world data before applying for a full-time job.
Here comes the meat of the whole story where my real Data Science journey began. The list below briefly summarizes my learning path with the help of many great people and different online resources.
The very first textbook that I read was An Introduction to Statistical Learning — with Applications in R. I highly recommend this textbook for beginners as the book focuses on the fundamental concepts of statistical modelling and machine learning with detailed and intuitive explanations. If you are a mathematically hardcore person, you would like this book: The Elements of Statistical Learning.
2. Online Courses
Coursera. Machine Learning taught by Andrew Ng, the co-founder of Coursera. I have always been fascinated by his ability to break down complicated concepts into simpler pieces of information for learning. The 11-week course focuses on supervised learning, unsupervised learning, and best practices in machine learning with practical applications in real world. I still refer to the lecture notes sometimes to solve underfitting or overfitting problems when building machine learning models.
Udemy. Python for Data Science and Machine Learning Bootcamp taught by Jose Portilla. This course starts from teaching the fundamentals of Python and proceed to guiding you step-by-step on how to implement various machine learning and deep learning codes using scikit-learn and tensorflow. This course gave me a great overview of various libraries available in Python to implement machine learning models. In addition, I highly recommend my personal favourite course: Deep Learning A-Z™: Hands-On Artificial Neural Networks taught by Kirill Eremenko and Hadelin de Ponteves. This was my first exposure to deep learning and trust me, their course is truly one-of-a-kind with great emphasis on the instinctive level of understanding with hands-on coding tutorials on Supervised and Unsupervised Deep Learning.
Lynda. Python for Data Science Essential Training taught by Lillian Pierson. The course teaches fundamentals of data munging and data visualization with other statistical analysis.
Okay. So you are interested in Data Science/Analytics field? Then create a LinkedIn account if you do not have one.
LinkedIn is such a powerful platform with close-knit Data Science community. The sharing-learning environment is simply amazing that people are willing to share their experience, thoughts and knowledge to help others. In fact, LinkedIn is where I learn the most, be it technical knowledge, career advice etc. Inspired, I am now starting to give back to the community by sharing my thoughts and experiences on my LinkedIn. 😃
Some data scientists even come together to carry out a weekly webinar — Data Science Office Hours to discuss and give insights on the fundamentals of Data Science (data preparation, features extraction, data visualization etc.). Be sure to check it out!
4. Other Resources
Most beginners in Data Science field very often got overwhelmed with oceans of resources (like myself) and one might just be very confused of which one to choose. One of my friends on LinkedIn — Randy Lao has shared a very comprehensive list of Data Science resources which are updated periodically.
Building a Portfolio
Have a portfolio to showcase your experience and capability, especially when you do not have a PhD to be a Data Scientist.
Since I have a Bachelor of Science in Physics without any Computer Science degree, neither do I have any relevant exposure during my first three years in college, building my portfolio in addition to learning breadth topics from MOOCs is necessary. This is important because at the end of the day, companies want to know what you have learned and how you can contribute and add value to their business.
This is also one of the reasons that I decided to pursue my current internship while juggling my part-time internship and learning from MOOCs. Besides, I also volunteer with data organization — DataKind to maximize social impact by helping other NGOs to solve their problems.
I have always wanted to participate in Kaggle competitions and not long ago, I got the chance to join a machine learning challenge with my friends on Kaggle which was organized by Shopee and Institution of Engineering and Technology (IET). I was really grateful to be a part of the team and I definitely learned a lots from them. Be sure to check out their profiles — Low Wei Hong, Chong Ke Xin, and Ling Wei Onn!
This was my first time to join a Kaggle competition and learn how to use Convolutional Neural Networks (CNN) and transfer learning for image recognition. Learning curve was steep but the journey was definitely rewarding! Looking forward to sharing more with you about our competition’s project in the next post!
If time allows, I also hope to share with you some of my internship projects in my future posts and the codes will be uploaded on GitHub.
Choose a job you love, and you will never have to work a day in your life
That’s all for now. I hope I did shed some light on Data Science industry and made the learning towards Data Science less scary but more fun and more approachable! Never have I found the feeling that “The more I learn, the more I need to learn” until I bumped into Data Science that gives me challenges and fulfilment.
I hope that by documenting my learning journey, this post could in some ways inspire you to go for your passion despite challenges and difficult circumstances.
If you have any questions, feel free to leave your comments below!