With Scrapy you can crawl web sites and get their content, mainly text and images. Since it isn't possible to install Scrapy with sudo apt-get install scrapy, the recommended way is to install it inside a virtual Python environment called Anaconda.
With these commands you can get Scrapy up and running on Ubuntu 16.04:
Download the latest Anaconda to your /tmp directory and start the installation, you can always see if there is a new release here: https://repo.continuum.io/archive/
cd /tmp
curl -O https://repo.continuum.io/archive/Anaconda3-5.3.1-Linux-x86_64.sh
bash Anaconda3-5.3.1-Linux-x86_64.sh
Press enter and type "yes" when required. When you are prompted for which directory to install Anaconda in, I personally change it to a directory with a punctuation mark in front, to keep it hidden, like this:
/home/YOUR_USER_NAME/.anaconda3
Make sure this line is added in ~/.bashrc:
# added by Anaconda3 4.4.0 installer
export PATH="/home/YOUR_USER_NAME/.anaconda3/bin:$PATH"
Make the added line take effect
source ~/.bashrc
Check Anaconda works
conda list
Install scrapy
conda install -c conda-forge scrapy
Check that scrapy is installed
scrapy --version
RESULT:
Scrapy 1.4.0 - no active project
See scrapy location
whereis scrapy
RESULT:
scrapy: /home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/bin/scrapy
After test running Scrapy I got this error
# Error: PIL missing
  File "/home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 15, in
    from PIL import Image
ModuleNotFoundError: No module named 'PIL'
Fix the error by installing pillow with conda, still inside the virtual environment
conda install pillow
Scrape with Scrapy
cd /home/project/scrapy/projectname/ && /home/YOUR_USER_NAME/.anaconda3/envs/scrapy_june_2017/bin/scrapy crawl my_spider -o /home/YOUR_USER_NAME/my_spider.csv -t csv --set=CLOSESPIDER_ITEMCOUNT=10 --set=CLOSESPIDER_TIMEOUT=500
Open a page in scrapy console
scrapy shell https://example.com/test-page
Get the title of the page, in H1 tags
response.xpath('//h1/text()').extract()
Exit Scrapy console with Ctrl+D
NOTE: When you install Anaconda it might install its own version of glib, taking over gsettings, which is quite annoying, since it can result in this error: "GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications." when trying to use gsettings. A work around is to use "/usr/bin/gsettings" to access the original gsettings.
https://askubuntu.com/questions/916334/ubuntu-16-04-glib-gio-message-using-the-memory-gsettings-backend-your-settin/959346#959346
Update Anaconda and packages
Run these two commands:
conda update conda
conda update anaconda
From: https://medium.com/@mauridb/how-to-check-your-anaconda-version-c092400c9978
Virtual environments
You can create a virtual environment for different environments
conda create --name scrapy_june_2017 python=3
Activate the virtual environment
source activate scrapy_june_2017
Close the Anaconda virtual environment
source deactivate scrapy_june_2017
Remove (base) from your command line
If you want to remove (base) from your terminal, update your .bashrc file to use the old simpler format. Replace my_user_name with your own user name:
# Anaconda 4.4.0 config style # added by Anaconda3 4.4.0 installer export PATH="/home/my_user_name/.anaconda3/bin:$PATH"
Delete this bit:
# added by Anaconda3 5.3.1 installer
# >>> conda init >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$(CONDA_REPORT_ERRORS=false '/home/my_user_name/.anaconda3/bin/conda' shell.bash hook 2> /dev/null)"
if [ $? -eq 0 ]; then
    \eval "$__conda_setup"
else
    if [ -f "/home/my_user_name/.anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/my_user_name/.anaconda3/etc/profile.d/conda.sh"
        CONDA_CHANGEPS1=false conda activate base
    else
        \export PATH="/home/my_user_name/.anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda init <<<
https://stackoverflow.com/questions/51526503/why-does-base-appear-in-my-anaconda-command-prompt
Uninstalling Anaconda
Uninstalling Anaconda is as easy as deleting the folder:
 rm ~/.anaconda3 -rf
Remember to remove instances in ~/.bashrc file, and source it.
 
