更新时间:2021-06-24 15:20:20
coverpage
Title Page
Copyright and Credits
Python Data Mining Quick Start Guide
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Data Mining and Getting Started with Python Tools
Descriptive predictive and prescriptive analytics
What will and will not be covered in this book
Recommended readings for further explanation
Setting up Python environments for data mining
Installing the Anaconda distribution and Conda package manager
Installing on Linux
Installing on Windows
Installing on macOS
Launching the Spyder IDE
Launching a Jupyter Notebook
Installing high-performance Python distribution
Recommended libraries and how to install
Recommended libraries
Summary
Basic Terminology and Our End-to-End Example
Basic data terminology
Sample spaces
Variable types
Data types
Basic summary statistics
An end-to-end example of data mining in Python
Loading data into memory – viewing and managing with ease using pandas
Plotting and exploring data – harnessing the power of Seaborn
Transforming data – PCA and LDA with scikit-learn
Quantifying separations – k-means clustering and the silhouette score
Making decisions or predictions
Collecting Exploring and Visualizing Data
Types of data sources and loading into pandas
Databases
Basic Structured Query Language (SQL) queries
Disks
Web sources
From URLs
From Scikit-learn and Seaborn-included sets
Access search and sanity checks with pandas
Basic plotting in Seaborn
Popular types of plots for visualizing data
Scatter plots
Histograms
Jointplots
Violin plots
Pairplots
Cleaning and Readying Data for Analysis
The scikit-learn transformer API
Cleaning input data
Missing values
Finding and removing missing values
Imputing to replace the missing values
Feature scaling
Normalization
Standardization
Handling categorical data
Ordinal encoding
One-hot encoding
Label encoding
High-dimensional data
Dimension reduction
Feature selection
Feature filtering
The variance threshold
The correlation coefficient
Wrapper methods
Sequential feature selection
Transformation
PCA
LDA
Grouping and Clustering Data
Introducing clustering concepts
Location of the group
Euclidean space (centroids)
Non-Euclidean space (medioids)
Similarity
Euclidean space
The Euclidean distance
The Manhattan distance