DSI TECHNICAL GUIDE
Note: The following is a Markdown formatted readme file of this Google Document that is sent to students by production teams upon signing up for the course.
Before the course starts, you may want to familiarize yourself with the following technologies:
- Anaconda - We will be using Anaconda as our primary development environment
- Python 2.7 - We will be using Python & its packages as our primary language
- Github - We’ll be using Github on a daily basis to store and share our code
- Git - You will also need to install and configure command line tools for Git
- Postgres - We’ll be using Postgres for local SQL-based data storage
- Tableau is a popular dashboard creation system for visualizing data.
- Slack - We’ll be using Slack on a daily basis to communicate with each other
Anaconda bundles many of the Python packages we’ll be using, including:
- Python 2.7: One language to rule them all...
- Ipython / Jupyter & Pandas: Core tools for notebooks & data analysis.
- Matplotlib: The king of all python plotting packages.
- Gensim: Framework for vector modeling.
- NLTK & Spacy**: Used for natural language processing.
- NumPy: Array processing tool.
- Scikit-learn: Modules for machine learning & data modeling.
- SciPy: Scientific library for python.
- Seaborn: Statistical data visualizer.
- Pip & Setuptools: package installer & version manager (Mac only).
- PyMC: common stats tool for simulation and optimization.
- Sqlite: Standalone, lightweight SQL database engine.
- Statsmodels: Simple statistical computation (used with SciPy).
These tools aren't specifically required, but are highly recommended.
- Atom or Sublime are popular text editors for writing scripts to process data, perform analysis, and create visualizations.
- Chrome is Google's popular web browser, and comes with a complete set of developer tools built-in.
- Import.io: a useful web scraping tool with a graphic interface.
- Plot.ly: a user-friendly tool for plotting graphs.
A NOTE ABOUT TECHNOLOGY
Read the notes below to make sure your machine will work for DSI:
Make sure your machine is running with administrator permissions and has at least 10-20 GB of free disk space. We also recommend that you use a laptop with a 13-inch screen or larger in order to do your best work. In our experience, students with an 11-inch screen have a harder time in class.
General Assembly is a Mac-friendly organization. Our instructors will be teaching the course using Macs, so we strongly recommend students use a Mac with OS X 10.8 Mountain Lion or greater in order to run all of the programs necessary for the course. This rules out some older MacBooks versions (circa 2006-2007).
Check the following specs to make sure your machine can provide you with the performance you’ll need in this course:
- 1.6GHz dual-core Intel Core i5 processor
- Turbo Boost up to 2.7GHz
- Intel HD Graphics 6000
- At least 8GB RAM
- 128GB flash storage
- 10-20 GB of free disk space
While you can be a data scientist with any machine, unfortunately, there are a number of compatibility issues with our Python libraries and most versions of Windows.
That’s why we require our Windows users to install a Virtual Machine in order to run a Linux environment. This allows all of our students and instructors to use the same UNIX-based commands when working on class materials. Prior to the start of the course, we’ll send you instructions on how to install Linux + Ubuntu as part of our custom DSI “Installfest” walkthrough.
Please note that our instructors will be conducting the course using Macs, and may not be able to help you troubleshoot any issues you might encounter with a Linux environment.
If you choose to use a PC + Linux, you will need to provide your own IT support.