Purpose of the Role
Your role will be to build out a data acquisition team as part of a larger product team. You will bring your expertise in data crawling, analytical mind, keen eye for detail and leadership abilities into building a team that will acquire significant amounts of data to feed groundbreaking products redefining enterprise software. This is a startup environment and we expect you to be able to work hands-on as needed and mentor and build a functional team.
Duties and Responsibilities
- Work closely with the clients to identify and acquire data, crawl information from multiple online data sources
- Define the strategy and tools to be used for web crawlers, web scrapers and other automation tools, to help extract the content
- Lead a team of web crawling engineers and be responsible for their mentorship
- Drive change by staying up to date with the data science trends and technologies
Required Experience & Knowledge
- Previous experience leading a team and defining workload
- Previous hands-on experience working in a web crawling role is a must
- Senior expertise in Java and experience working with high volume web crawling/scraping
- Experience with Java related frameworks such as Spring, Hibernate
- Experience working with open source tools such as ApacheNutch, StormCrawler etc
- Experience working with relational databases such as MySQL, PostgreSQL, Oracle and noSQL databases
- Experience with queuing systems like RabbitMQ / AMQP, Kafka, JMS, Amazon Kinesis
- Experience with JSON, REST API, HTML, XML
Advantage
- Familiarity with supplying data to support machine learning efforts a plus
- Familiarity with Elasticsearch, AWS and the process of scaling and approaches
Skills and Attributes
- Fluency in English (both written and spoken) is a must
- Client-oriented approach, high integrity and commitment
- Excellent communication and teamwork skills
- Detail-oriented and focused on quality
- Excellent organizational skills
Required Education & Qualifications
- BA or MSc Degree, or relevant experience