Installing Scrapy with Anaconda Python virtual environment on Ubuntu 16.04

With Scrapy you can crawl web sites and get their content, mainly text and images. Since it isn't possible to install Scrapy with sudo apt-get install scrapy, the recommended way is to install it inside a virtual Python environment called Anaconda.
With these commands you can get Scrapy up and running on Ubuntu 16.04:
Download the latest Anaconda to your /tmp directory and start the installation, you can always see if there is a new release here: https://repo.continuum.io/archive/
cd /tmp
curl -O https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
bash Anaconda3-5.3.1-Linux-x86_64.sh
Press enter and type "yes" when required. When you are prompted for which directory to install Anaconda in, I personally change it to a directory with a punctuation mark in front, to keep it hidden, like this:
/home/YOUR_USER_NAME/.anaconda3
Make sure this line is added in ~/.bashrc:
# added by Anaconda3 4.4.0 installer
export PATH="/home/YOUR_USER_NAME/.anaconda3/bin:$PATH"
Make the added line take effect
source ~/.bashrc
Check Anaconda works
conda list
Install scrapy
conda install -c conda-forge scrapy
Check that scrapy is installed
scrapy --version
RESULT:
Scrapy 1.4.0 - no active project
See scrapy location
whereis scrapy
RESULT:
scrapy: /home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/bin/scrapy
After test running Scrapy I got this error
# Error: PIL missing
File "/home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 15, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'
Fix the error by installing pillow with conda, still inside the virtual environment
conda install pillow
Scrape with Scrapy
cd /home/project/scrapy/projectname/ && /home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/bin/scrapy crawl my_spider -o /home/YOUR_USER_NAME/my_spider.csv -t csv --set=CLOSESPIDER_ITEMCOUNT=10 --set=CLOSESPIDER_TIMEOUT=500
Open a page in scrapy console
scrapy shell https://example.com/test-page
Get the title of the page, in H1 tags
response.xpath('//h1/text()').extract()
Exit Scrapy console with Ctrl+D
NOTE: When you install Anaconda it might install its own version of glib, taking over gsettings, which is quite annoying, since it can result in this error: "GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications." when trying to use gsettings. A work around is to use "/usr/bin/gsettings" to access the original gsettings.
https://askubuntu.com/questions/916334/ubuntu-16-04-glib-gio-message-using-the-memory-gsettings-backend-your-settin/959346#959346
Update Anaconda and packages
Run these two commands:
conda update conda
conda update anaconda
From: https://medium.com/@mauridb/how-to-check-your-anaconda-version-c092400c9978
Virtual environments
You can create a virtual environment for different environments
conda create --name scrapy_june_2017 python=3
Activate the virtual environment
source activate scrapy_june_2017
Close the Anaconda virtual environment
source deactivate scrapy_june_2017
Remove (base) from your command line
If you want to remove (base)
from your terminal, update your .bashrc file to use the old simpler format. Replace my_user_name with your own user name:
# Anaconda 4.4.0 config style # added by Anaconda3 4.4.0 installer export PATH="/home/my_user_name/.anaconda3/bin:$PATH"
Delete this bit:
# added by Anaconda3 5.3.1 installer # >>> conda init >>> # !! Contents within this block are managed by 'conda init' !! __conda_setup="$(CONDA_REPORT_ERRORS=false '/home/my_user_name/.anaconda3/bin/conda' shell.bash hook 2> /dev/null)" if [ $? -eq 0 ]; then \eval "$__conda_setup" else if [ -f "/home/my_user_name/.anaconda3/etc/profile.d/conda.sh" ]; then . "/home/my_user_name/.anaconda3/etc/profile.d/conda.sh" CONDA_CHANGEPS1=false conda activate base else \export PATH="/home/my_user_name/.anaconda3/bin:$PATH" fi fi unset __conda_setup # <<< conda init <<<
https://stackoverflow.com/questions/51526503/why-does-base-appear-in-my-anaconda-command-prompt
Uninstalling Anaconda
Uninstalling Anaconda is as easy as deleting the folder:
rm ~/.anaconda3 -rf
Remember to remove instances in ~/.bashrc file, and source it.