We are looking for a Web Scraping Engineer to join our team!
As a Web Scraping focused Data Engineer, you will be responsible for extracting, transforming and storing data to nosql db from websites using web crawling tools.
In this role you will own the creation process of these tools, services, and workflows to improve crawl/ scrape analysis, reports and data management.
You will also be responsible to test the data to insure accuracy and quality. You will own the process to identify and rectify any issues with breaks as well as scale scrapes as needed.
Write bots to source publicly available data (scraping websites, consuming data published via APIs or CSV, or extracting data from PDFs) in order to create new data feeds, and also help solve problems with our existing feeds.
Requirements
- 2+ years of web scraping experience
- Productionized experience with one or more of the following web scraping frameworks and tools: Scrapy, Puppeteer, Selenium, ScrapingHub, BeautifulSoup, Import.io Webhose.io
- Basic knowledge of data engineering (database ingestion, ETL, etc.)
- Experience with data testing/quality assurance processes, scripting & tools
- Experience with bypassing Bot detection (HTTP Proxy, CAPTCHA etc.)
- Query and understand structured data: SQL (SQLite/MySQL or similar), JSON, XML
- Familiarity with NoSql databases (Graph Databases are even better)
- Solution orientation and "can do" attitude - with a desire to tackle complex problems.
- Advantage: Experience with data DevOps tools such as airflow
- Advantage: experience with cloud environment (Azure, AWS)
- Advantage: experience extracting data from multiple disparate sources including Web, PDF, and spreadsheets.
Benefits:
- Contract
- Possibility to work from home
- Advancement and professional development