At PINTU, We are building the #1 crypto investment platform to focus on new investors in Indonesia and Southeast Asia. We know that 99% of new investors are underserved because existing solutions cater to the 1% who are pros and early adopters hence we built an app that helps them to learn, invest and sell cryptocurrencies in one click away.
We’re looking for a full-time Senior Site Reliability Engineer to join our SRE Team responsible for Pintu's products. You will play a critical role in ensuring the reliability, availability, and performance of our software systems and infrastructure. You will work closely with cross-functional teams to build and maintain highly resilient and scalable systems. Your expertise in automation, monitoring, and incident response will be crucial in minimizing downtime and enhancing system reliability.
What You’ll Be Doing
In this role, you will need to be able to have:
Lead efforts to improve the reliability and availability of our systems through automation, proactive monitoring, and capacity planning.
Respond to and manage incidents, identifying the root cause and implementing preventive measures to minimize future incidents.
Develop and maintain automation tools and scripts to streamline operational tasks, configuration management, and deployment processes.
Analyze system performance and identify bottlenecks, making recommendations for improvements and optimizations.
Work on designing and implementing scalable architectures to accommodate growth and increased user demand.
Utilize IaC tools (e.g., Terraform, Ansible) to manage and provision infrastructure components.
Set up and maintain monitoring systems to track system health and performance metrics. Configure alerting and notifications to respond to anomalies.
Collaborate with development teams to ensure that new applications and features are designed with reliability and operability in mind.
Provide guidance, mentorship, and technical leadership to junior members of the SRE team, fostering their professional growth and ensuring team cohesion
Create and maintain documentation for systems, processes, and best practices.
Implement and maintain security best practices and participate in security reviews and audits.
Participate in an on-call rotation to provide 24/7 support and incident response.
Who We Are Looking For
Bachelor's degree in Computer Science, Information Technology, or a related field.
Several years of experience in a Site Reliability Engineer or DevOps role.
Proficiency in scripting and programming languages like Bash, Python, or Go.
Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and cloud platforms (e.g., AWS & Google Cloud).
Expertise in Kafka, including setting up, configuring, and managing Kafka clusters for real-time data streaming.
Hands-on experience with designing, implementing, and maintaining distributed systems and microservices architectures.
Experience with configuration management tools (e.g., Terraform and Ansible).
Deep understanding of networking, databases, and web services.
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, ELK Stack).
Excellent problem-solving skills and the ability to work well in high-pressure situations.
Strong communication and collaboration skills.
Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional DevOps Engineer) are a plus.
Let’s Realise a Cryptocurrency Bank for Everyone!
We are building the #1 cryptocurrency bank for everyone to accelerate the transition to an open financial system
We have impacted many lives but there’s still plenty to do and we can’t do it alone. You can learn more about us
What is PINTU? PINTU is a blockchain-based digital investment app that focuses on new investors. We have created a user-friendly app that helps new investors to learn, buy and invest cryptocurrency one click away.
Our agility and firm hold on our core purpose and values have allowed us to remain resilient and thrive through tumultuous times.