# machine_learning **Repository Path**: Nlola/machine_learning ## Basic Information - **Project Name**: machine_learning - **Description**: Shanghai University Machine Learning (ML01) course - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 5 - **Created**: 2023-04-27 - **Last Updated**: 2023-04-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ML01 Machine Learning, UTSEUS, Shanghai University ## Language English. For everything. ## Where and When ### Tencent Meeting For each session, please always join Tencent Meeting (VooV Meeting): - Room ID:958 9491 5777 ### Laptop For each session, please bring your own Laptop! For Thursdays' practice sessions, please bring your headphone as well, because you will watch videos. ### Monday (Lectures and Continuous assessments) Mostly theory. - 20:00 - 21:40 - B313 ### Wednesday (Exercise sessions) Mostly code. - 10:00 - 11:40 - B313 ### Thursday (Practice sessions) Practice sessions can help you to be more instrustry-ready. - 13:00 - 16:40 - B315 ## Lectures (Monday) ### Week 1 - Machine Learning overview ### Week 2 - Linear Regression ### Week 3 - Logistic Regression (for classification) ### Week 4 - Neural networks During the class, we will play a little bit with Tensorflow Playground: - https://playground.tensorflow.org AFTER the class, please watch those videos very carefully: - https://www.youtube.com/watch?v=aircAruvnKk - https://www.youtube.com/watch?v=IHZwWFHWa-w - https://www.youtube.com/watch?v=Ilg3gGewQ5U - https://www.youtube.com/watch?v=tIeHLnjs5U8 ### Week 5 - Building a Machine Learning web app - https://microsoft.github.io/ML-For-Beginners/#/3-Web-App/1-Web-App/README - https://github.com/microsoft/ML-For-Beginners/tree/main/3-Web-App/1-Web-App ### Week 6 - Model selection ### Week 7 - CNN - for image classification - for image segmentation ### Week 8 - GAN ### Week 9 - AutoEncoder ### Week 10 - DQN ## Continuous assessment (Monday) Tests will take place on Mondays (Week 2, Week 4, Week 6, Week 8). Each test falls in the topic of its previous week, with some extensions (e.g. some more math). You are recommended to read materials provided by prof ahead of time, to maximize your chance of success. In total, 4 tests will be conducted. Tests are on paper, with book closed, no Internet, no electronic device, no discussion with classmates, no asking prof questions. After each test, feel free to forget everything that you have learned for test preparation, because your intuition has already been developed and will stay with you. After you have experienced all this, you gain more confidence on youself and would be more open to new challenges. And that's the most important thing. ### Week 2 (Test 1/4) Materials to read before test: - all jupyter notebooks for lectures and exercises - https://www.t-ott.dev/2021/11/24/animating-normal-distributions - https://demonstrations.wolfram.com/TheBivariateNormalDistribution/ - https://online.stat.psu.edu/stat505/lesson/4/4.2 - https://github.com/features/actions - https://docs.github.com/en/actions/quickstart - https://github.blog/2022-02-02-build-ci-cd-pipeline-github-actions-four-steps/ - https://resources.github.com/ci-cd/ - https://github.com/readme/guides/sothebys-github-actions - https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent - https://www.ssh.com/academy/ssh-keys Test (30 min): - **Code**: Python list - **Code**: Numpy slicing, numpy broadcast - **Math**: Gaussian Distribution - **Math**: Bivariate Gaussian Distribution - **Math**: Linear Algebra: matrix multiplication - **Math**: Linear Algebra: eigen value, eigen vector - **Misc**: ssh key, CI/CD, GitHub Actions ### Week 4 (Test 2/4) Materials to read before test: - all jupyter notebooks for lectures and exercises - https://www.bilibili.com/video/BV1SY4y1G7o9/ Test (30 min): - **Code**: Linear Regression implementation from scratch - **Code**: Logistic Regression implementation from scratch - **Math**: gradient descent for linear regression and logistic regression - **Math**: MSE for linear regression - **Math**: cross-entropy loss function for logistic regression - **Misc**: GitHub Pull Request (GitHub workflow), git conflict resolving, git merge v.s. git rebase ### Week 6 (Test 3/4) Materials to read before test: - all jupyter notebooks for lectures and exercises - The four 3b1b videos [[1]](https://www.youtube.com/watch?v=aircAruvnKk) [[2]](https://www.youtube.com/watch?v=IHZwWFHWa-w) [[3]](https://www.youtube.com/watch?v=Ilg3gGewQ5U) [[4]](https://www.youtube.com/watch?v=tIeHLnjs5U8) - https://www.analyticsvidhya.com/blog/2021/04/activation-functions-and-their-derivatives-a-quick-complete-guide/ - https://towardsdatascience.com/7-popular-activation-functions-you-should-know-in-deep-learning-and-how-to-use-them-with-keras-and-27b4d838dfe6 - https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6 - https://www.v7labs.com/blog/neural-networks-activation-functions - https://medium.com/analytics-vidhya/brief-history-of-neural-networks-44c2bf72eec - https://www.dataversity.net/a-brief-history-of-neural-networks/ - https://pub.towardsai.net/a-brief-history-of-neural-nets-472107bc2c9c - https://machinelearningmastery.com/the-chain-rule-of-calculus-for-univariate-and-multivariate-functions/ - https://dougenterprises.com/the-neural-network-chain-rule/ - https://theorydish.blog/2021/12/16/backpropagation-%E2%89%A0-chain-rule Test (30 min): - **Code**: Neural network implementation from scratch (1/2) - Some basic starter code (>= 90%) will be provided - **Math**: activation functions - **Math**: universal approximation theorem - **Math**: the chain rule of calculus for univariate and multivariate functions - **Math**: backpropagation algorithm - **History/Culture**: History of Neural Network/Deep Learning (Pre-2015) ### Week 8 (Test 4/4) Materials to read before test: - all jupyter notebooks for lectures and exercises - https://gitee.com/lundechen/machine_learning_web_app - https://zahidhasan.github.io/2020/10/13/bias-variance-trade-off-and-learning-curve.html - https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b Test (30 min): - **Code**: Neural network implementation from scratch (2/2) - Some basic starter code (<= 50%) will be provided - **Math**: L1 and L2 regularization - **Math**: Bias Variance Trade-off - **Math**: Learning Curve for determing overfitting and uderfitting - **History/Culture**: Major Research and Application breakthroughs of deep learning in recent years (After 2015) - **Misc**: [Machine Learning Web Application](https://gitee.com/lundechen/machine_learning_web_app): Docker, streamlit, fastapi, swagger, PyCharm debugging, etc. ## Exercise sessions (Wednesday) Most exercises will correspond to lecture topics, with some extensions. ### Week 1 Starting from this session, we will use Jupyter Notebook Please install Python, VS Code, and, ideally, you should be able to use Google Colab and GitHub. Make sure you have a seamless Internet connection to those websites. Exercise: - Python - Numpy - Pandas ### Week 2 Make sure that you can run our Jupyter Notebooks on VS Code. Also, make sure you have access to GitHub, Google and YouTube. Exercise: - Linear Regression from scratch ### Week 3 Exercise: - Logistic Regression from scratch ### Week 4 Exercise: - Neural Network from scratch ### Week 5 Exercise: - Play with those ML apps, and get some ideas for your Project - https://www.tensorflow.org/js/demos - https://streamlit.io/gallery - https://shiny.rstudio.com/gallery/ - PICTH! - Present your amazing idea (even if it can be refined later, and should be) - Get people to join your team - Kick off your projects ### Week 6 ### Week 7 ### Week 8 ### Week 9 ### Week 10 ## Practice sessions (Thursday) ### Week 1 - Week 2 Tutorial (with videos): - https://gitee.com/lundechen/static_website_with_go_hugo The main goal of this tutorial is **NOT** to teach you Web Technology, but to walk through the main steps for building a static website, and to learn to use, in the meanwhile: - Git (add, commit, push, pull, checkout, rebase, merge, conflict resolving) - GitHub (GitHub pull request, GitHub actions) Week 1: - Finish deploying the website on GitHub Pages (Video 1 - 5). - Send the url of your GitHub Pages to the WeChat group when you finish. Week 2: - GitHub workflow (Video 6 - 8). - Each group two students. - Please install GitLens (VS Code extension). - Send the url of your GitHub Repositories to the WeChat group when you finish (three urls, for each group of two). #### Pro tips 用 Windows 的同学,可以按照这个教程,安装 posh-git - https://gitee.com/lundechen/hello#9-optional-git-branchstatus-indication-on-terminal Corresponding tutorial video: - https://www.bilibili.com/video/BV1cq4y1S7Be/ (starting from the 6th minute of the video) 同时 windows 建议使用 Windows Terminal 主要是为了这个: ![](img/posh-git.png) If you are on MacOS/Linux, you can install *oh my zsh* instead. 同时 windows 建议使用 Windows Terminal ### Week 3 #### Task 1: reveal.js - Follow the video - Create a repo named `cv` - Deploy the website as `https:.github.io/cv` - Go to reveal.js official website, and try out different features - [demo](https://revealjs.com/demo), with its corresponding [source code](https://github.com/hakimel/reveal.js/blob/master/demo.html) - code highlight - image background - animation - transition - For each of those different features - create a seperate GitHub repository - therefore, you end up with multiple `remote`s on your local repository - Send the URLs of your reveal.js websites to the WeChat group when you finish #### Task 2: Deploy the reveal.js/GoHugo websites on Tencent Static Website Hosting Service, or AWS S3 - Send the URLs of your Tencent/AWS S3 websites to the WeChat group when you finish #### Pro tips Emoji HTML code: - https://www.quackit.com/character_sets/emoji/emoji_v3.0/unicode_emoji_v3.0_characters_all.cfm Make sure you have a Tencent Cloud account, and an AWS account with a Credit Card bound to it. For a credit card, you might need help from a friend living now in foreign countries. I am not sure a Visa card in China will do or not. For AWS, it will cost 1$, and then everything is basically free for one year. ### Week 4 - Test Driven Programming - Tutorial: https://open-academy.github.io/machine-learning/assignments/get-started.html - Video - Chinese version: https://www.bilibili.com/video/BV1uW4y1s7Ci - Video - English version: https://www.bilibili.com/video/BV1nM41167j9 - GitHub Classroom ### Week 5 - Week 7 Machine Learning web application. #### Week 5 (Video 1 - Video 7) Tutorial (with videos): - https://gitee.com/lundechen/machine_learning_web_app Your task: - Local deployment. - Data Augmentation for better performance: - https://open-academy.github.io/machine-learning/assignments/ml-fundamentals/ml-overview-mnist-digits.html - [optional] Deploy a [tensorflow.js demo](https://www.tensorflow.org/js/demos) on local - [optional] Deploy [how-old-are-you-according-to-a-cnn-app](https://www.kaggle.com/code/ubiratanfilho/how-old-are-you-according-to-a-cnn) on local #### Pro tips If you really struggle with tensorflow etc. on your local machine, you could consider - Google Colab - GitHub Codespace - AutoDL for renting a GPU machine - https://www.autodl.com/ #### Pro tips If you encounter this issue: - ModuleNotFoundError: No module named 'streamlit.cli' here is the solution: - [solution](https://stackoverflow.com/questions/68162180/modulenotfounderror-no-module-named-streamlit-cli) #### Week 6 (Video 8 - Video 10) Tutorial (with videos): - https://gitee.com/lundechen/machine_learning_web_app Your task: - Cloud VM deployment - for the ML web app (Digit Recgonization) - for the UFO prediction web app as well - Streamlit Cloud deployment - https://docs.streamlit.io/streamlit-community-cloud/ - [optional] GitHub WebHook (no tutorial from prof., things are to be done by students). - [optional] Deploy a [tensorflow.js demo](https://www.tensorflow.org/js/demos) on a cloud VM #### Week 7 (Video 11 - Video 12) Tutorial (with videos): - https://gitee.com/lundechen/machine_learning_web_app Your task: - Docker deployment, for the ML web app - Docker deployment, for GoHugo and reveal.js website - https://www.bilibili.com/video/BV1PY4y1t746/ - Send the url of your ML web app to the WeChat group when you finish. - [optional] Install MySQL and PhpMyAdmin with Docker (Video 13) - [optional] Streamlit Cookie (Video 14) - [optional] Deploy a [tensorflow.js demo](https://www.tensorflow.org/js/demos) on a cloud docker container ![](img/azure.png) ### Week 8 AWS SageMaker - [optional] Deploy a [tensorflow.js demo](https://www.tensorflow.org/js/demos) on AWS SageMaker ### Week 9 - 10 Machine learning app next.js/react, AWS Amplify Team work, IAM etc. The Web App will be connected to a Cloud DataBase (DynamoDB). One student has access to DB, the other one not, via IAM control. Lambda will be used as well. S3 should be present in the project as well. IAM for S3. Videos to be made with Zhu Xinning and Huanglongfei. Basically, it will be the AWS Amplify/next.js version of: - https://gitee.com/lundechen/machine_learning_web_app Students will also learn how to manage resource access with IAM control, because every two students will pair up and work together. ## Project ### Kick off Projects kick off at Week 5, exercise session. ### Forming groups Each group 3 students. Forming groups: - https://docs.qq.com/doc/DT2xqVHphanhGUWpR At most ONE group could have 2 or 4 students, provided that `N_Student % 3 != 0`. ### Get inspired - https://www.tensorflow.org/js/demos - https://streamlit.io/gallery - https://shiny.rstudio.com/gallery/ - https://github.com/MarcSkovMadsen/awesome-streamlit - Alternatives to streamlit: - https://anvil.works/articles/4-alternatives-streamlit - https://huggingface.co/ ### Implementation Your machine learning web application can be based on streamlit, flask, next.js, tensorflow.js or any other framework. It should be deployed on the cloud (Tencent/Alibaba/Google/Microsoft Cloud). You can use chatgpt or gpt4, if you have the API key. You can use AI APIs from Baidu/Tencent/Alibaba/Amazon/Google etc., for example: - https://cloud.tencent.com/product/ai-class Apply for a domain name if necessary, e.g. [http://an-interesting-ml-app.com](http://an-interesting-ml-app.com). As an alternative, a WeChat miniprogram is OK as well. Use emojis or fontawesome/bootstrap icons where appropriate. You code should be open-sourced and hosted on GitHub. ### What's expected of your video - Length of video \>= 20 min - You video should include those contents: - General presentation - Where do you get inspirations from for coming up with the idea of your project - How to use your ML Web app - How did you implement your app - How did you deploy your app - How CI/CD/GitHub Actions/WebHook plays a role in your deployment - If possible, make it fun (Because life is good). - If possible, make it fancy (Because you are young). - If applicable, include an ethics analysis of your project in the video. - If applicable, include an social impact analysis of your project in the video. - If applicable, include an market analysis of your project in the video. - If applicable, include an ecology analysis of your project in the video. - And yes, your video should be presented in English. ### Submission of your work 1. Create a folder, in which you put: - the video - the source code - a txt/markdown file indicating - what's the task of each team member - the estimated workload/contribution percentage of each team member - a txt/markdown file indicating - the URL of your GitHub repository for hosting your code - prof will check the commit history of your GitHub repo to see how each team member is contributing 1. Zip the folder 1. Upload the zip file to Google Drive 1. Send the sharing link to the prof, by PRIVATE WeChat or by Email - Therefore, in the WeChat/Email message, there are no attached files, just an Google Drive URL. For each team, just one submission of the work is necessary, by one member of your team. Deadline for submission: - The second Friday of the 14 days of Exam Weeks of SHU, 23:59. ### For best projects - Best projects might be hosted on [http://lunde.top](http://lunde.top), to inspire future projects. - Best projects' videos will be included on Lunde Chen's bilibili channel. - Lunde Chen might invite you to participate in innovation competitions with your projects. ### Last but not least Your app should be legal, ethical. ## Score Denoting your Continuous assessment score as `T`, your project score as `P`, your final score will be ```python max(P, 0.4 * T + 0.6 * P) ``` The average of all `T`s of all students will be equal to the average of all `P`s. ### Distribution of notes - 10% A (90-100) - 20% A- (85-89) - 30% B (80-84) - 20% C (75-79) - 20% D/E/F Historical failing rate: - 2021: 10% ## Gallery ### GoHugo/reveal.js website - https://alexisz12.github.io/ - https://moonoxy.github.io/ - https://huangusr.github.io/ - https://lifelongcoding.github.io - https://alexandreqiu.github.io/web/ - https://leo-fang-qaq.github.io/ - https://jialing78.github.io/ - https://hong-yue111.github.io/ - https://morganelu.github.io/ ### ML web app ### Final project ## Misc #### Zen of Python https://peps.python.org/pep-0020/ ```text Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! ``` ## Asking questions :question: ### Leveraging **[Gitee Issue](https://gitee.com/lundechen/cpp/issues)** for asking questions By default, you should ask questions via **[Gitee Issue](https://gitee.com/lundechen/cpp/issues)**. Here is how: - https://www.bilibili.com/video/BV1364y1h7sb/ ### Principe Here is the principle for asking questions: > **Google First, Peers Second, Profs Last.** You are expected to ask questions via **[Gitee Issue](https://gitee.com/lundechen/cpp/issues)**. However, as a **secondary** (and hence, less desirable, less encouraged) choice, you could also ask questions in the WeChat group. > Why Gitee Issue? Because it's simply more **professional**, and better in every sense. In Gitee Issue and the WeChat group, questions will be answered selectively. Questions won't be answered if: - they could be solved on a simple Google search - they are out of the scope of the course - they are well in advance of the progress of the course - professors think that it's not interesting for discussion ### Regarding personal WeChat chats: - **Questions asked in personal WeChat chats will NOT be answered.** Learning how to use Google & Baidu & Bing & ChatGTP to solve computer science problems is an important skill you should develop during this course. For private questions, please send your questions by email to: - lundechen@shu.edu.cn (Lunde Chen) ### Office visit Office visit is NOT welcome unless you make an appointment at least one day in advance. ## Student Name List | 学号/工号 | 姓名 | | -------- | --- | | 19124641 | 陆开昕 | | 19124663 | 万远亮 | | 19124715 | 张行行 | | 20120127 | 李兆琪 | | 20124695 | 邱奕博 | | 20124711 | 袁嘉祾 | | 20124725 | 张世博 | | 20124727 | 翁留辰 | | 20124738 | 宋鹏宇 | | 20124757 | 王宇星 | | 20124767 | 黄河 | | 20124793 | 王雨杰 | | 18124686 | 赵宇豪 | | 18124689 | 闫炳坤 | | 19124519 | 冯玥瑄 | | 20124694 | 洪越 | | 20124696 | 方鑫喆 | | 20124726 | 马哲 | | 20124733 | 杜若衡 | | 20124769 | 李鑫宇 | | 20124770 | 王泓杰 | | 20124771 | 王楚涵 | | 20124772 | 娄宇鑫 | | 21124683 | 戴志成 | ## Online resources 1. 吴恩达机器学习系列: - https://www.bilibili.com/video/BV164411b7dx 1. 吴恩达深度学习系列: - https://www.bilibili.com/video/BV164411m79z ## F.A.Q #### What characterizes this ML01 machine learning course? - Stressful, fun and rewarding. #### Do we have extra-course work? - Yes. A lot. - At least 10 hours of extra-course work each week is expected from you. - 4 hours for course content & test preparation - 6 hours for your project (6 is the bare minimum, you might want to shoot up to 20 or 30 towards the end of the trimester). #### What can I add as items to my CV after taking this course? - It's quite a lot. For example, AWS Amplify, AWS SageMaker, GitHub Pull Request, GitHub workflow, GoHugo, reveal.js, Cloud Computing, streamlit, fastapi, swagger, Docker, nginx, GitHub WebHook, machine learning, deep learning, next.js, MySQL, GitHub Actions, numpy, pandas, matplotlib, seaborn, plotly, sklearn, tensorflow, DQN, javascript, etc. #### Why this course seems a bit different? - Well, the prof draws inspirations from courses of Stanford, Berkeley and MIT. - http://cs231n.stanford.edu (Stanford) - https://c.d2l.ai/berkeley-stat-157 (Berkeley) - http://introtodeeplearning.com (MIT) - http://cs229.stanford.edu (Stanford) #### Do we have a slogan? - Yes. See the picture below 👇 ![](img/justdoit.png) ## License This repository is licensed under [MIT](LICENSE).