Propulsion’s Data Science Batch 7 projects - Some highlights
This is the first of many project recaps that we’re writing for our future students and corporate partners to give insight into the kind of projects Propulsion’s Data Science students get to work on during their Capstone Project.
Propulsion’s batch #7 (May 13 2019 - July 31 2019) of Data Science students worked on five projects that were provided by our industrial partners, such as Swiss International Airlines, Qard and PriceHubble. All projects involved Machine Learning and two included Deep Learning. They covered a broader space of Data Science applications. Here is a list and some details.
Project 1 - Predicting Bise wind at Zurich airport - Swiss International Airlines
Zurich airport faces ~ 30% delays when the Bise wind (cold, dry, north-east to south-west direction) hits Zurich airport. This project focused on predicting Bise events as well as predicting their duration The two students who worked on this task reached a precision of ~80%. Though such results are high, they need to reach the 95% level to be used in a real-life warning system We’re happy with these results and even happier to know that the project will be continued in the upcoming Data Science batch.
Project 2 - Default Prediction on E-Commerce based on Public Data - qardfinance.com
As a FinTech startup, Qard analyzes e-commerce businesses applying for loans and uses a data-driven approach to identify those with a high risk of default on their loan payments. Qard would like to extend this system to using non-financial data. For the purpose of this project, Propulsion’s students worked on extracting e-commerce-specific non-financial data from around 400GB of structured/unstructured data that has been collected by Qard over the years. Propulsion students reached an accuracy of around 70% on identifying default cases using non-financial data. Development of such a system would essentially help all loan providers because they would not need to ask a borrower specific details about their finances.
Project 3 - PhenoCAT: (Un)supervised classification of microscopy images with Deep Learning - personal student project
This was an independent project brought by one of the students with PhDs in similar fields. Image-based Genetic Perturbation screens are regularly used in research labs to identify markers of cancer causing genes. Such screens generate petabytes of data (millions of images) and require automatic systems to analyze these images. The two students wanted to test if they could use Deep Learning, primarily Convolutional Neural Networks and Variational Auto-Encoders to automatically classify images into their category of interests. Since no labeled data was present, students had to use active learning as a way to sequentially create their train-test data. The supervised approach produced an accuracy of >90%. A second approach using unsupervised learning based on auto-encoders needs further exploration, but was already able to create real looking computer generated images..
Project 4 - Real estate image classification (quality of houses) - Pricehubble
This project involved the application of Active Learning with Convolutional Neural Networks to automatically classify property images into different price categories. For the purpose of this project, multiple pre-trained networks (ex: ResNet and VGG16) were used as a starting point to further train them with our data. Using pre-trained networks is a standard practice in image analytics using Deep Learning. The student who worked on the challenge could achieve an accuracy of ~93% on this data.
Project 5 - Skill-gap analysis and course recommendations for your best-fit job - Propulsion
As an EdTech startup, Propulsion often looks for ways to help our students develop their learning needs using data. The central aim of this project was to identify the skill set required for the technology-related jobs in Switzerland, match it with the job seeker’s own skills and background (as extracted from LinkedIn profiles), and finally offer the latter suitable positions or training programs. To this end, the students employed NLP techniques to find out the semantic similarities between job ads and candidate skills, something which most job recommendation services lack. Propulsion is now working on developing this project as an online tool to help not only our students but also the general swiss public.