Installing Scrapy with Anaconda Python virtual environment on Ubuntu 16.04

Anaconda Python virtual environment install screen

With Scrapy you can crawl web sites and get their content, mainly text and images. Since it isn't possible to install Scrapy with sudo apt-get install scrapy, the recommended way is to install it inside a virtual Python environment called Anaconda.

With these commands you can get Scrapy up and running on Ubuntu 16.04:

Download the latest Anaconda to your /tmp directory and start the installation, you can always see if there is a new release here: https://repo.continuum.io/archive/

cd /tmp
curl -O https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
bash Anaconda3-5.3.1-Linux-x86_64.sh

Press enter and type "yes" when required. When you are prompted for which directory to install Anaconda in, I personally change it to a directory with a punctuation mark in front, to keep it hidden, like this:

/home/YOUR_USER_NAME/.anaconda3

Make sure this line is added in ~/.bashrc:

# added by Anaconda3 4.4.0 installer
export PATH="/home/YOUR_USER_NAME/.anaconda3/bin:$PATH"

Make the added line take effect

source ~/.bashrc

Check Anaconda works

conda list

Install scrapy

conda install -c conda-forge scrapy

Check that scrapy is installed

scrapy --version

RESULT:

Scrapy 1.4.0 - no active project

See scrapy location

whereis scrapy

RESULT:

scrapy: /home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/bin/scrapy

After test running Scrapy I got this error

# Error: PIL missing
File "/home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 15, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'

Fix the error by installing pillow with conda, still inside the virtual environment

conda install pillow

Scrape with Scrapy

cd /home/project/scrapy/projectname/ && /home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/bin/scrapy crawl my_spider -o /home/YOUR_USER_NAME/my_spider.csv -t csv --set=CLOSESPIDER_ITEMCOUNT=10 --set=CLOSESPIDER_TIMEOUT=500

Open a page in scrapy console

scrapy shell https://example.com/test-page

Get the title of the page, in H1 tags

response.xpath('//h1/text()').extract()

Exit Scrapy console with Ctrl+D

NOTE: When you install Anaconda it might install its own version of glib, taking over gsettings, which is quite annoying, since it can result in this error: "GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications." when trying to use gsettings. A work around is to use "/usr/bin/gsettings" to access the original gsettings.
https://askubuntu.com/questions/916334/ubuntu-16-04-glib-gio-message-using-the-memory-gsettings-backend-your-settin/959346#959346

Update Anaconda and packages

Run these two commands:

conda update conda

conda update anaconda

From: https://medium.com/@mauridb/how-to-check-your-anaconda-version-c092400c9978

Virtual environments

You can create a virtual environment for different environments

conda create --name scrapy_june_2017 python=3

Activate the virtual environment

source activate scrapy_june_2017

Close the Anaconda virtual environment

source deactivate scrapy_june_2017

Remove (base) from your command line

If you want to remove (base) from your terminal, update your .bashrc file to use the old simpler format. Replace my_user_name with your own user name:

# Anaconda 4.4.0 config style
# added by Anaconda3 4.4.0 installer
export PATH="/home/my_user_name/.anaconda3/bin:$PATH"

Delete this bit:

# added by Anaconda3 5.3.1 installer
# >>> conda init >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$(CONDA_REPORT_ERRORS=false '/home/my_user_name/.anaconda3/bin/conda' shell.bash hook 2> /dev/null)"
if [ $? -eq 0 ]; then
    \eval "$__conda_setup"
else
    if [ -f "/home/my_user_name/.anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/my_user_name/.anaconda3/etc/profile.d/conda.sh"
        CONDA_CHANGEPS1=false conda activate base
    else
        \export PATH="/home/my_user_name/.anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda init <<<

https://stackoverflow.com/questions/51526503/why-does-base-appear-in-my-anaconda-command-prompt

Uninstalling Anaconda

Uninstalling Anaconda is as easy as deleting the folder:
rm ~/.anaconda3 -rf

Remember to remove instances in ~/.bashrc file, and source it.

https://docs.anaconda.com/anaconda/install/uninstall/