职位描述
Who we are:Talend is a rapidly growing leader in the data and application integration space. We are a global company with offices in Redwood City, Paris, London, Bonn, Beijing, Tokyo, Singapore, Sydney, and India. Our customers are thought leaders in data integration and big data and they represent some of the biggest and best names in retail, financial services, consumer products, and business services.Talend enables data-driven companies to be more effective at what they do by providing them with the tools, infrastructure, and guidance to make informed business decisions. Our architecture and technology uniquely position us to take advantage of constant innovation in data and processing technologies, such as Hadoop and Spark. As a result, we provide the fastest, most scalable data integration platform in the industry, and we support Big Data natively.Talend is looking for a Site Reliability Engineer to join our growing team in the Beijing office. In this role, you’ll be responsible for the security, stability, and scalability of our Talend Cloud service. You’ll get to work hands-on with plenty of exciting technology and scale challenges as we grow to support millions of transactions across hundreds of servers in our Talend Cloud environment. We are seeking candidates with expertise on both development and system administration.Responsibilities? Ensure high reliability and availability for production systems, including upgrade and release processes and incident handling? Manage and deploy internal and external monitoring solutions for maintaining high availability for production systems? Be responsible for troubleshooting cloud infrastructure, systems, network, and application stacks? Perform on-call duty as part of a team maintaining the availability and performance of our cloud infrastructure as well as the various internal services and systems that these core services depend on.? Maintain technical operations for our Talend Cloud infrastructure. Administer Linux systems (including configuration, troubleshooting, and automation), AWS/Azure Cloud infrastructure, CI/CD services? Work with fellow operations engineers and development teams on complex problems, and make decisions and recommendations about systems improvements after analyzing possible action choices? Develop effective tooling, alerts, and response to both identify and address reliability risks? Define and evangelize cloud-related optimizations and best practices to improve reliability and performanceRequired SkillsMust haves:? Proficient spoken/written English level;? Bachelor’s degree in Computer Science or a relevant field;? Strong working knowledge of Linux (RedHat/CentOS) systems and applications including Tomcat, Java, Apache, ElasticSearch, ActiveMQ, Nginx Proxy;? Cloud engineering experience;? Experience with IaaC / configuration mgmt. / systems automation tools at scale (e.g. Terraform, AWS CloudFormation, Ansible, Puppet, etc.);? Experience with network management systems and network monitoring tools such as Nagios, Icinga, Kibana, LogStash, Cacti;? Practical knowledge of CI/CD tools (Jenkins, Bamboo, GitLab CI,etc.) and Java-specific integration tools (Ant, Groovy, Gradle, Maven/Nexus/Artifactory). Experience with Jenkins (configuration, implementing CI pipelines via code, maintenance);? Proficiency in scripting languages (Python, bash, Groovy, etc.);? Ability to work independently, strong interpersonal and communications skills;Strongly beneficial:? Experience with administering AWS. Knowledge of Microsoft Azure or other IaaS/PaaS Infrastructures;? Experience with containers (Docker). Knowledge of Kubernetes;? Experience with big data systems and/or database administration (e.g. MySQL, PostgreSQL, Mongo DB, NoSQL);职能类别:
系统工程师
系统架构设计师关键字:
运维开发