A Tour of Data Science Educational Programs

Hits: 348

In my quest to become a data scientist, I have embarked on a series of educational journeys, which have included both formal, in-school, educational forums and self-learning, self-paced MOOCS (massive open online courses). Let me start at the beginning of my path to getting an education and training in data science.

Business Intelligence Analytics Advanced Diploma Program

I previously did work as a process engineer for an oil refinery, then switched careers and became a physician, a psychiatrist. After many years seeing patients, teaching medical students, and performing clinical research, I decided it was time to switch to yet another career.

As I was helping my son to pick out courses for his upcoming enrollment at a local community college, I stumbled upon a Business Intelligence Analytics Program at that same community college. As I read through the program description and courses, it looked very interesting to me at the time, and decided to enroll in that program, to learn about business intelligence analytics.

As I went through the program for the first term, it occurred to me that I was more interested in machine learning and predictive analytics, rather than just looking at historical data and presenting the descriptive statistics and what happened last quarter. Although this program gave me a good overview of databases, I found the program lacking in many ways, as the school did not put in the resources to link the students with coop jobs with the local industry in data analytics. So when I was unable to find a coop job in the summer after the first term was completed, I decided that I would need to augment my education, rather than just rely on the program at the school and let the summer go to waste.

MOOCS, Machine Learning, R, and Python

Over the summer break, since I could not find a coop job in data analytics, I decided to look into online learning, also called MOOCS (massive open online courses). I first started with a machine learning MOOC, which was quite informative and hands-on, and became skilled at using R when utilizing various models to apply regression, classification, and clustering to different datasets. From my first success in MOOCS with machine learning and R, I decided to take more courses in computer science and programming.

I particularly liked the MOOC where I was introduced to Python programming. Once I got introduced to Python, I was hooked, as I saw the versatility of Python for web scraping, data parsing, database, analysis, and visualization. Within just 4 weeks of daily Python immersion on MOOCS and solving problems on hackerrank.com, I became good enough to call myself a Python programmer.

It was also at that same time that it dawned on me that I had the requisite skills needed as a computer science student to apply to data science master’s programs. The idea of applying to master’s programs was now a reality, since I had accumulated sufficient skills in database, machine learning, and coding, which was required for many of the master’s programs in data science. Also, when I looked at the various job listings for data scientists, almost all of the companies required either a master’s degree, or 2 to 3 years of experience as a data scientist.

That cemented it for me to look into applying to grad school. Fortunately, I did quite well at the community college with the various database courses I took there, and was able to obtain excellent faculty references, which is one of the main requirements for applying to the master’s programs.

MS Analytics Program

As I mentioned in a previous post, I interviewed at an MS Analytics program, but it did not have computer science and coding as part of its curriculum, and was too focused on statistics. Given their total disregard for coding and computer science, I decided against that program (and that decision was mutual, as they did not offer me a position, citing stiff competition). But that statistics professor was way off base, and obviously did not know the definition of a data scientist, and confused them for statisticians. But data scientists are not only experts at statistics…they are also experts at computer science and coding, in addition to having domain expertise.

This being my first exposure to graduate programs in data science, I began to question their ROI (return on investment), especially when I already have two university degrees, one of which is a technical degree in chemical engineering. I was wondering if certificate and diploma programs in data science may be the better option for me, especially with my industry experience and engineering degree. And of course, I could take more MOOCS, as that is how I learned machine learning and how to code! But fortunately, my next interview with a graduate program really impressed me, and I impressed them (how do I know this…read on!).

MSc in Computing and Data Analytics

Good news! I was accepted into a Master of Science Program in Computing and Data Analytics. This program is a well-balanced mix of computer science, statistics, and business intelligence. I had to pass a programming test to get in, as they only accept data science grad students who can actually code…imagine that! I’m very glad to be in this program, and looking forward to starting grad school this Fall. For me, this program was the missing piece in my data science training. For me, I have to get a master’s degree in data science, given my other degrees and industry experience were not in the IT industry.

EMC Data Science Certifications

Even though I am slated to start grad school in a few weeks, I still decided to take the EMC Data Science Associate (EMCDSA) course, as many of the practicing data scientists have the EMCDSA certification. Once I obtain my EMCDSA this summer, I plan on continuing to the next level, and work on obtaining the EMC Data Science Specialist (EMCDSS) certification. The great thing about these are that they are also in MOOC format. Fortunately, I have a group to study these courses, and we meet in-person weekly to go over the material we learned during the week.

Summary

So that is my tour of data science educational programs. For me, getting the master’s degree in data science from a well-balanced program is key to my education and training as a data scientist. I don’t believe everyone needs to take my same path, but it is an example of how one person is getting training in this data science field which currently has no official standards for training. For me, the master’s degree will serve as my foundation, while the MOOCS and various data science certifications will augment and enhance my training and experience.

As a word of caution, if you are looking into a master’s program in data science, please pick programs that are well balanced in all the core areas of data science, including computer science (algorithms, coding), statistics, and business intelligence. Skip the ones that ignore computer science, and skip the ones that ignore statistics.

Good luck on your journey to becoming a data scientist, and please contact me should you have any questions.

photo credit: velkr0 classroom via photopin (license)

Why Coding is Important for Data Scientists

Hits: 311

As a Data Scientist in training, much of my orientation to the field has been about what skills are needed to become one. In my research and exposure to the field of data science, the knowledge, experience, and skillsets that data scientists have are domain expertise, computer science, and statistics. It appears that the most successful data scientists have expertise in all 3 areas, in addition to their deep knowledge of a specific area(s). So in my pursuit of training as a data scientist, I have these 3 areas in mind when looking at filling-in the gaps in my knowledge and experience.

Master’s Program

I recently had an interview for a master’s program in data science, and I posed the question to them about the focus of their data science master’s program. The statistician professor answered the question by detailing their focus on statistics and machine learning, and how to apply appropriate models to specific problems, and also how to optimize and test such models. I was impressed with this answer, as it is important for data scientists to understand the algorithms that they are applying to datasets. However, there seems to be some in the field who treat the algorithms and analysis as a black box, where the only important features are the selection of the model and the output of the data analysis…they don’t care about how the analysis was performed, or why the chosen algorithm works better than the others. Fortunately, this master’s program was all about understanding the models and optimizing them, which are important skills for a data scientist.

To Code, Or Not To Code

However, when I asked about their approach to computer science and coding, the statistics professor’s reply was:

‘Coding is cheap…we just outsource that, so it is not that important.’

What the heck?

How can you say that understanding analysis and algorithms are important, and not treat it like a black box, then come out and say that coding is cheap?

I have a different opinion. Coding is a basic necessity for all data scientists…if you don’t understand your spoken language, then how are you supposed to communicate your solution, being the data scientist that is a liaison between the business analysts and the backend developers? If you can’t code, then you are not able to harness the power of computers, and thus not able to take advantage of that computing power via elegant and sophisticated algorithms. If you can’t code, then you can’t be innovative, and you can’t create new models for use in the CPUs and the data-lakes that are increasing in power and storage capacity at an exponential rate.

Manual Versus Automated Analysis

If you can’t code, then you will be forced to analyze your data manually, and you spend enormous chunks of time just extracting, cleansing, transforming, and migrating your data (also known as ETL), as you can’t code to automate those processes. If you can’t code, then you waste too much time on prepping your data. If you can’t code, you don’t have time left to perform a business requirements analysis, and no time is left to choose an appropriate model for analysis, and no time left to adequately train the model and optimize and fine-tune it, with different features and dimensions.

How do I know all this? Well, I have tried to do data prepping and analysis via manual methods, due to my previous lack of coding expertise. I previously spent too much time on the data prepping, manipulating, parsing, and migrating data, as I was doing things manually, and it took away time from my other roles as a data scientist, which includes the model selection and fine-tuning.

Data Science and Coding

Now I know better…I have since become proficient at Python, and consider myself a Python programmer. Life is now much easier, when you can code and use computing power to do all the things I used to do manually, like scraping websites, extracting and migrating data, ETL, and analysis. I can now spend most of my time with solution documents, requirements analysis, model selection, model fine-tuning and optimization, implementation planning, project management, and communication of the solution.

Basically, if you can’t code, then you will not be an effective Data Scientist. I’m not saying you have to be an expert coder, as you should leave the complex coding to the software engineers who can code in their sleep. What I’m saying is that you do have to speak the language of your profession, as you can’t be effective as a problem solver, designer, analyst, and communicator if you can’t code.

Summary

In summary, real Data Scientists are coders. Some choose not to code, but they still know how to code. And those who choose not to code are most likely thought leaders in the field, where their high level expertise is more valuable than their coding skills. Data Scientists are coders. Don’t hire one without coding skills. And if you can’t code, then you are not a Data Scientist.

 

photo credit: markus spiske html php java source code via photopin (license)