DSCI_V 100 (3) Introduction to Data Science
Use of data science tools to summarize, visualize, and analyze data. Sensible workflows and clear interpretations are emphasized. [3-0-1] Prerequisite: MATH 12
Use of data science tools to summarize, visualize, and analyze data. Sensible workflows and clear interpretations are emphasized. [3-0-1] Prerequisite: MATH 12
Data science methods to automate the running and testing of code and analytic reports, manage data analysis software dependencies, package and deploy software for data analysis, and collaborate with others using version control. [3-0-1] Prerequisite: DSCI 100 and either (a) one of CPSC 203, CPSC 210, CPEN 221 or (b) one of MATH 210, ECON 323 and one of CPSC 107, CPSC 110.
Analysis, design, and implementation of static and interactive visual representations; visualization literacy; data communication; exploratory Data Analysis; application of theoretical principles to visualization development. [3-1.5-0] Prerequisite: STAT 201 and one of CPSC 203, CPSC 210, CPEN 221.
Ethical application of data science and machine learning algorithms. Application of ethical theories in real-world case studies. Data ownership, collection, and validity. Algorithm auditing, fairness and transparency. Reducing unfairness in algorithms. Deployment of predictive models and dissemination of results. [3-0-0] Prerequisite: One of CPSC 330, CPSC 340, STAT 301, STAT 406.
Pseudo-code. Program design and structure. Flow control. Iteration. Lists (arrays). Functions. File I/O. Classes, objects, methods, and libraries. This course is not eligible for Credit/D/Fail grading.
Basic algorithms. Recursion. Data structures including linked lists, queues, stacks, trees, graphs, and hash tables. Searching and sorting. Introduction to complexity including Big-O notation, efficiency, and scalability. Prerequisite: All of DSCI 511, DSCI 521. This course is not eligible for Credit/D/Fail grading.
Relational schemas. SQL queries. Database programming using embedded SQL. XML and XQuery. This course is not eligible for Credit/D/Fail grading.
Introduction to software, shells, tools, and file systems for use in the Data Science program. Installation, configuration, and use of statistical and programming software including Integrated Development Environments (IDEs). Problem resolution skills. This course is not eligible for Credit/D/Fail grading.
Interactive and non-interactive data analysis. Scripting. Dynamic reporting. Reproducibility. Project and file management. Version control. Automated workflows. Prerequisite: All of DSCI 511, DSCI 521. This course is not eligible for Credit/D/Fail grading.
Program design and data manipulation using industry-standard software tools designed for statistical work. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis. Corequisite: All of DSCI 511, DSCI 521. This course is not eligible for Credit/D/Fail grading.
Software life cycle. Unit testing. Continuous integration. Submission to a relevant repository for distribution. Packaging for installation and use by others. Software licenses. Classes and abstraction. Prerequisite: DSCI 522. This course is not eligible for Credit/D/Fail grading.
Networks and the Internet, scraping data, APIs, cloud computing, Web services for scalable computing, Web hosting, Web publication platforms, introduction to parallel computing. Prerequisite: All of DSCI 522, DSCI 523. This course is not eligible for Credit/D/Fail grading.
Descriptive plots using statistical and programming software. Basics, mechanics, and principles of data visualization. Prerequisite: All of DSCI 511, DSCI 521. This course is not eligible for Credit/D/Fail grading.
Interactive visualization, design choices, dynamic change over time, multiple views, data reduction, dealing with complexity. Prerequisite: DSCI 531. This course is not eligible for Credit/D/Fail grading.
Privacy and data. Ethics boards, legal issues, licensing. Physical and logical data security, social engineering. Encryption, data anonymization, privacy-preserving techniques. Case studies. This course is not eligible for Credit/D/Fail grading.
Claims, reasons, and evidence. Strengths and weaknesses of models. Effective oral and written presentation of scientific results, including interpretation of data and recognition of assumptions, bias, validity, and reliability. Citations, references, and peer-review. This course is not eligible for Credit/D/Fail grading.
Descriptive statistics including measures of location and spread. Random variables, distributions, and parameters. Categorical variables. Uncertainty. Missing data. This course is not eligible for Credit/D/Fail grading.
Random variables, parameters, observed data, statistics (distinctions and connections). Estimation: point and interval. Two-group comparisons, frequentist version. Simulation-based approaches. Prerequisite: DSCI 551. This course is not eligible for Credit/D/Fail grading.
Multiple hypothesis testing, false discovery rate. Two-group comparisons, Bayesian paradigm. Prerequisite: DSCI 552. This course is not eligible for Credit/D/Fail grading.
Randomization. A/B testing. Blocked designs. Orthogonality. Batch effects, confounding. Causality. Contemporary examples. Simulations. Prerequisite: All of DSCI 553, DSCI 561. This course is not eligible for Credit/D/Fail grading.
Linear models: continuous response; one or more categorical covariates and/or one or more continuous covariates. Prerequisite: DSCI 552. This course is not eligible for Credit/D/Fail grading.
Non-parametric regression and smoothing. Data-driven parameter selection. Robust regression. Mixed effects. Prerequisite: DSCI 561. This course is not eligible for Credit/D/Fail grading.
Unsupervised learning. K-means/medoids. Model-based clustering. Expectation-maximization algorithm. Hierarchical clustering. Dimension reduction. Matrix decomposition. Heatmaps, contour plots, dendograms. Prerequisite: All of DSCI 511, DSCI 521. This course is not eligible for Credit/D/Fail grading.
Decision trees. k-th nearest neighbour classifiers. Naive Bayes classifiers. Logistic regression. Prerequisite: All of DSCI 511, DSCI 521. This course is not eligible for Credit/D/Fail grading.
Support Vector Machines. Random Forests. Ensemble Classifiers. Graphical models. Prerequisite: DSCI 571. This course is not eligible for Credit/D/Fail grading.
Performance of a classification model. Generalization error, overfitting of training data. Shrinkage, feature selection, Akaike Information Criterion, Bayesian Information Criterion. k-fold cross validation. Bootstrapping. Receiver Operating Characteristic curve. Elastic nets, regularization. Prerequisite: DSCI 571. This course is not eligible for Credit/D/Fail grading.
Time series. State space and change point detection. Hidden Markov Models. Gaussian processes. Prerequisite: DSCI 572. This course is not eligible for Credit/D/Fail grading.
Neural networks trained with backpropagation. Deep learning. Overfitting and underfitting. Active data acquisition. Hyperparameter optimization. Prerequisite: DSCI 572. This course is not eligible for Credit/D/Fail grading.
A capstone design project designed to give students experience in leading complex multidisciplinary projects relevant to data science. Prerequisite: All of DSCI 513, DSCI 524, DSCI 525, DSCI 532, DSCI 541, DSCI 542, DSCI 554, DSCI 563, DSCI 573, DSCI 574, DSCI 575. This course is not eligible for Credit/D/Fail grading.