The Summer cohort deadline is March 23, 2025
Apply here

AI, Machine Learning, and Data Science

This course offers a comprehensive introduction to Machine Learning, the heart of AI. Students will explore both the foundational and pragmatic aspects of Machine Learning. The curriculum emphasizes essential mathematical principles, the algorithms that drive software functionality, and the rationale behind these computations. Students will learn Python programming and the software tools used in Machine Learning. While some models will be created in class, there will be opportunities for students to develop and implement their own models.

This course is designed for both novices and advanced students, with no prior knowledge of computer science, programming, or Machine Learning required. Students will receive supplemental materials, including videos, notes, and code and will also be provided with resources for more advanced topics. Topics such as Natural Language Processing and Sentiment Analysis (using Recurrent Neural Networks) and Image Classification (Convolutional Neural Networks) will be available for those who wish to select project themes in these areas.

Pre-approved Topic List

  1. Breast Cancer Wisconsin Diagnostic Dataset. Develop a model capable of diagnosing breast cancer based on information derived from imaging of cell nuclei within tumors.
  2. Student Performance Dataset. Create a predictive model to estimate students’ grades based on various factors, including study time, television viewing habits, and the number of siblings.
  3. Car Quality Evaluation Dataset. Construct a classification model that assesses a vehicle’s quality, categorizing it as unacceptable, acceptable, good, or very good, thereby informing purchasing decisions.
  4. Wine Quality Dataset. Develop a model that evaluates wine quality based on its chemical properties, including various acidity levels.
  5. Heart Disease Dataset. Create a model that assesses a patient’s risk of heart disease based on demographic and health-related factors such as gender, blood pressure, height, and weight.
  6. Telco Customer Churn Dataset. Develop a predictive model that identifies customers likely to leave a service provider, enabling the formulation of retention strategies.
  7. Pima Indians Diabetes Database. Construct a model that predicts the onset of diabetes using diagnostic metrics, including weight, height, and blood pressure.
  8. TMDB Box Office Prediction Dataset. Predict a movie’s worldwide box office revenue based on various attributes, including cast, crew, plot keywords, budget, release dates, and production companies.
  9. Bank Note Authentication Dataset. Create a model that detects counterfeit currency.
  10. Go to College Dataset. Develop a model that predicts a student’s likelihood of attending college, considering factors such as parental education, income, and the student’s GPA.
  11. Credit Fraud Detection Dataset. Construct a model that identifies fraudulent transactions on credit cards.
  12. Song Genre Classification from Audio Data. Create a model that accurately identifies the genre of songs based on audio features.
  13. Credit Card Approval Decision Dataset. Develop a model that determines the eligibility of credit card applications.
  14. Dog Breed Identification Dataset. Create a model that predicts the breed of a dog based on images.
  15. CIFAR-10 – Object Recognition in Images. Develop a model that recognizes and classifies objects present in images.
  16. Dogs vs. Cats Dataset. Create a model that determines whether an image contains a dog or a cat.
  17. Dandelion Images Dataset. Construct a model that identifies the presence of dandelions in images.
  18. COVID-19 with or without Pneumonia Dataset. Develop a model that distinguishes between COVID-19 cases with or without pneumonia based on chest X-ray images.
  19. Amazon Reviews for Sentiment Analysis. Create a model that rates customer reviews on a scale from 1 to 5.
  20. BBC News Classification Dataset. Develop a model that categorizes news articles into the domains of business, entertainment, politics, sports, or technology.
  21. IMDB Dataset of 50,000 Movie Reviews. Construct a model that classifies movie reviews as positive or negative.
  22. 20 Newsgroups Dataset. Create a model that accurately classifies newsgroup articles into their respective categories.
  23. Sentiment Analysis on Tweets. Develop a model that classifies tweets as positive or negative in sentiment.
  24. AMA Spam Collection Dataset. Create a model that detects spam messages.
  25. 200,000+ Jeopardy Questions Dataset. Construct a model that classifies questions according to their respective topics.
  26. Fake News Detection Dataset. Develop a model that identifies whether a news article is genuine or fabricated.