CSCA 5502: Data Mining Pipeline

Get a head start on program admission

ÌýÌýPreview this courseÌýin the non-credit experience today!Ìý
Start working toward program admission and requirements right away.ÌýWork you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.

Cross-listed with DTSA 5504

Course Type: Computer ScienceÌýElective

Specialization: Data Mining Foundations and Practice

Instructor:ÌýDr. Qin (Christine) Lv, Associate Professor of Computer Science

Prior knowledge needed:

  • Programming languages: Basic to intermediate experience with Python, Jupyter Notebook
  • Math: Basic experience with Probability and Statistics, Linear Algebra
  • Technical requirements: Windows or Mac, Linux, Jupyter Notebook

Learning Outcomes

  • Identify the key components of the data mining pipeline and describe how they're related
  • Apply techniques to address challenges in each component of the data mining pipeline.
  • Identify particular challenges presented by each component of the data mining pipeline.

Course Grading Policy

AssignmentPercentage of Grade
Peer Review:ÌýData Mining Example10%
Peer Review:ÌýData Mining Issues10%
Programming Assignment:ÌýData Understanding20%
Programming Assignment:ÌýData Preprocessing20%
Programming Assignment:ÌýData Warehousing20%
CSCA 5502 Data Mining Pipeline Final Exam20%

Course Content

Duration: 7 hours

This week provides you with an introduction to the Data Mining Specialization and this course, Data Mining Pipeline. As you begin, you will get introduced to the four views of data mining and the key components in the data mining pipeline.Ìý

Duration: 5.5 hours

This week covers data understanding by identifying key data properties and applying techniques to characterize different datasets.Ìý

Duration: 5.25Ìýhours

This week explains why data preprocessing is needed and what techniques can be used to preprocess data. Ìý

Duration: 5Ìýhours

This week covers the key characteristics of data warehousing and the techniques to support data warehousing.Ìý

Duration: 1.75Ìýhours

Final Exam Format: Proctored exam administered through ProctorU

This module contains materials for the final exam. This exam is a proctored exam administered through ProctorU.

  • You will need to arrange for a time to take the proctored exam.
  • It is a one-hour exam.
  • You may submit your answers only once.
  • The exam contains only multi-choice questions.
  • There are no programming questions in the exam.
  • You are not allowed to use any notes or access other websites when you take your exam.
  • The exam tests conceptual understanding of the course materials. There is no need to memorize formulas.

Notes

  • Cross-listed Courses: CoursesÌýthat are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
  • Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click theÌýView on CourseraÌýbuttonÌýabove for the most up-to-date information.