Google Colab is a popular cloud-based platform that allows you to run Python code in a Jupyter notebook environment. While Colab is primarily used for machine learning and data analysis tasks, it can also be used for web scraping and automation tasks using the Selenium library.
In this blog post, we will walk through how to set up Selenium automation on Google Colab, including the installation process and example of how to use Selenium to create twitter automation bot.
First, you'll need to open your Google Colab notebook. Once you're in, you'll want to install Selenium by running the following command in the command prompt:
!pip install selenium
This will install the Selenium library on your Colab notebook.
Since Ubuntu no longer distributes chromium-browser outside of snap, we need to add some commands to install it. Copy and paste the following commands in your Colab notebook and execute them in the same cell that starts with %%shell
%%shell
# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF
# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg
# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500
Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300
Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF
# Install chromium browser and driver
!apt-get update
!apt-get install chromium chromium-driver
Once you have installed the necessary dependencies, you will need to create Selenium options to ensure that the webdriver does not crash upon start in your Google Colab environment. Since Google Colab is Ubuntu terminal-based without a GUI, it's important to add the following options:
#create new cell and add this function and click run
def web_driver():
options = webdriver.ChromeOptions()
options.add_argument("--verbose") options.add_argument('--no-sandbox')
options.add_argument('--headless') options.add_argument('--disable-gpu')
options.add_argument("--window-size=1920, 1200")
options.add_argument('--disable-dev-shm-usage') driver =
webdriver.Chrome(options=options)
return driver
Now we can call this function to initiate our Chrome webdriver
The first step is to import the required libraries. create a new cell and import all the libraries init
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By from selenium import webdriver
from time import sleep import random
Next step is Defining driver_wait Function in new cell and run the cell to make sure its available, here we will create a function to handle necessary conditions in wait
def driver_wait(my_xpath, w_time=10, wait_type=EC.presence_of_element_located, untilnot=False,
driver=None):
if not driver:
driver = driver
try:
if type(my_xpath) == str:
if untilnot:
element = WebDriverWait(driver, w_time).until_not(wait_type((By.XPATH, my_xpath)))
return element
else:
element = WebDriverWait(driver, w_time).until(wait_type((By.XPATH, my_xpath)))
return element
else:
if untilnot:
element = WebDriverWait(driver, w_time).until_not(wait_type(my_xpath))
return element
else:
element = WebDriverWait(driver, w_time).until(wait_type(my_xpath))
return element
except Exception as ex:
print('wait_ex')
return False
The driver_wait function is used to wait for an element to load on a web page before performing an action on it. It takes four arguments:
my_xpath: The Xpath of the element we want to wait for. w_time: The amount of time we want to wait for the element to load. wait_type: The type of expected condition we want to wait for. untilnot: A boolean value indicating whether we want to wait for the element to appear or disappear. The function returns the element if it is found on the page or False if it is not found.
def get_driver():
options = webdriver.ChromeOptions()
#options.add_argument('--headless')
driver = webdriver.Chrome(options=options)
driver.get('https://twitter.com/login')
for x in 'YourUsername':
element = driver_wait('//input[@autocomplete="username"]', w_time=20, driver=driver)
element.send_keys(x)
sleep(2)
submit = driver_wait('//span[text()="Next"]', driver=driver)
driver.execute_script("arguments[0].click();", submit)
sleep(5)
for x in 'YourPassword':
element = driver_wait('//input[@autocomplete="current-password"]', w_time=20, driver=driver)
element.send_keys(x)
submit = driver_wait('//span[text()="Log in"]', driver=driver)
driver.execute_script("arguments[0].click();", submit)
sleep(5)
handle_to_follow = ''
driver.get('https://twitter.com/{handle_to_follow}/followers')
'''for i in 'Python web scraping':
search_ = driver_wait('//input[@aria-label="Search query"]', driver=driver)
search_.send_keys(i)
sleep(0.2)'''
#search_.submit()
sleep(5)
return driver
The get_driver Function The get_driver function is responsible for initializing the Selenium WebDriver object and navigating to the Twitter login page. Once on the login page, the function enters the username and password to log in to the user's Twitter account.
After logging in, the function navigates to the followers page of the Twitter handle specified in the script. Note that the driver_wait function is used to wait for page elements to load before interacting with them.
def click_follow():
driver = get_driver()
x= 0
while True:
try:
if x==100:
driver.refresh()
sleep(20)
sleep(3)
check = driver_wait('//span[text()="Follow"]', driver=driver, w_time=20)
if check:
handle = driver_wait('//span[text()="Follow"]//ancestor::div[@data-testid="UserCell"][1]//a', driver=driver)
print(handle.text)
driver.execute_script("arguments[0].scrollIntoView();", handle)
driver.execute_script("arguments[0].click();", handle)
#ActionChains(driver).move_to_element(handle)
click_ = driver_wait('//span[text()="Follow"]', driver=driver, w_time=20)
driver.execute_script("arguments[0].scrollIntoView();", click_)
driver.execute_script("arguments[0].click();", click_)
sleep(random.uniform(2, 5))
driver.back()
x += 1
if not check:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(random.uniform(2, 5))
except Exception as ex:
print(ex)
input('Please check the error message and press enter: ')
The click_follow function has a try-except block to catch any exceptions that might be thrown during its execution. If an error is encountered, the function prints the error message to the console and then prompts the user to check the error message before continuing. This allows the user to identify and fix any issues with the function's execution before resuming the process.
#lets call out function to trigger the bot
if __name__ == '__main__':
click_follow()
In conclusion, the script presented above shows how to use Selenium and Python to automate the process of following a specific Twitter handle. The script uses a combination of functions to handle errors, wait for page elements to load, and interact with the Twitter website. With some modifications, the script can be customized to follow other Twitter handles or to perform other tasks on the platform.