Senior/Staff Cloud Infrastructure Engineer
Company: Tbwa Chiat/Day Inc
Location: San Jose
Posted on: February 1, 2025
Job Description:
San Jose, California, United StatesWho We AreAt OKX, we believe
that the future will be reshaped by crypto, and ultimately
contribute to every individual's freedom. OKX is a leading crypto
exchange, and the developer of OKX Wallet, giving millions access
to crypto trading and decentralized crypto applications (dApps).
OKX is also a trusted brand by hundreds of large institutions
seeking access to crypto markets. We are safe and reliable, backed
by our Proof of Reserves. Across our multiple offices globally, we
are united by our core principles:We Before Me, Do the Right Thing,
and Get Things Done. These shared values drive our culture, shape
our processes, and foster a friendly, rewarding, and diverse
environment for every OK-er.About the TeamCloud Infrastructure
Engineering is a critical engineering discipline and a job function
in the company. Its charter is to build tools and infrastructure
that promote early detection of production failures, leading to a
stellar customer experience.Our work is to drive safety, health and
uptime of our platform, and the ability to remedy unforeseen
problems. By removing some of the complex burdens on how to scale
and maintain uptime in distributed systems, Cloud Infrastructure
Engineer allows development teams to focus on feature development
instead of the nuances of achieving and maintaining service level
commitments.About the OpportunityWe're looking for a creative and
driven individual that can spearhead our effort to push "outside
the box" infrastructure implementations, that will have a
tremendous impact on our platform's stability and scalability.What
You'll Be Doing
- Responsible for the maintenance and configuration of AWS
products and services.
- Responsible for the research, architecture and project
implementation solutions based on AWS products.
- Responsible for the daily maintenance of each AWS cloud
environment.
- Automate the provisioning, scaling, and configuration of
infrastructure resources using Terraform and CI/CD pipelines.
- Assist the team in troubleshooting and resolving production
database issues during incidents.
- Monitor company services and handle alerts in a timely manner
to ensure service stability and uptime.
- Collaborate with development teams to ensure seamless
integration and deployment of new features.What We Look For In
You
- Bachelors degree or above, major in Computer Science or
relevant domains, with over 6 years of experience in DevOps, SRE,
DBA or related positions.
- Proficient in AWS distributed management, large-scale
clustering, fault tolerance, backup, load balancing and other
technologies.
- Have a deep understanding of high availability architecture,
capacity planning, and rich experience in handling complex
problems.
- Have solid Linux platform operation and maintenance and
debugging capabilities, and be proficient in troubleshooting,
configuration tuning, and performance analysis.
- Familiar with Kubernetes (k8s) for container orchestration and
management.
- Familiar with the functional features of AWS products and core
products, and have rich practical experience in deployment and
tuning of EC2, EKS, VPC, or big data products.
- Experience with microservices architecture, including
deployment, scaling, and maintenance.
- Experience in monitoring, O&M and management of AWS
large-scale servers and containers.
- Familiarity with relational databases (e.g., MySQL, PostgreSQL)
and basic operations such as querying, monitoring performance
metrics, and reviewing logs.
- Familiar with the deployment, configuration and maintenance of
Nginx, kong and other software.
- Proficient in using Python/shell for development.
- Strong engineering skills, proficient in at least one O&M
or infrastructure sub-area, public cloud networking, SRE, DevOps or
cloud-native.
- Proficient in using Terraform for infrastructure as code (IaC)
to automate cloud resource provisioning and management.
- Excellent business analysis ability, system architecture
ability, and problem-solving ability. and strong self-drive.Nice to
Have
- Bilingual in English and Mandarin.
- Familiar with the operation and maintenance management of
Alibaba Cloud, Google Cloud, Microsoft Cloud and other cloud
providers.OKX StatementThe base salary range for this position is
$198,000 to $280,000. The salary offered depends on a variety of
factors, including job-related knowledge, skills, experience, and
market location. In addition to the salary, a performance bonus and
long-term incentives may be provided as part of the compensation
package, as well as a full range of medical, financial, and/or
other benefits, dependent on the position offered. Applicants
should apply via OKX internal or external careers site.OKX is
committed to equal employment opportunities regardless of race,
color, genetic information, creed, religion, sex, sexual
orientation, gender identity, lawful alien status, national origin,
age, marital status, and non-job related physical or mental
disability, or protected veteran status. Pursuant to the San
Francisco Fair Chance Ordinance, we will consider
employment-qualified applicants with arrest and conviction
records.
#J-18808-Ljbffr
Keywords: Tbwa Chiat/Day Inc, Cupertino , Senior/Staff Cloud Infrastructure Engineer, Engineering , San Jose, California
Didn't find what you're looking for? Search again!
Loading more jobs...