Ebury is a hyper-growth FinTech firm, named as one of the top 15 European Fintechs to work for by AltFi. We offer a range of products including FX risk management, trade finance, currency accounts, international payments, and API integration.
SRESão Paulo Office - Hybrid: 4 days in the office, 1 day working from home.
Ebury is a Global FinTech: we apply new technologies to enhance and automate financial services and processes. This allows small and medium-sized businesses to trade and transact internationally by eliminating boundaries related to more traditional methods.
Are you ready to be an Eburian? As a digital international payments platform, we are constantly challenged to change levels of quality, security, and usability. That's why we look for innovative and critical employees who care about the results and impact of our products on society. Our business is growing at an impressive speed, and our team needs new employees to make a difference on this journey.
We are always connected to the latest in the world of technology and software engineering. We trust our platforms to a variety of cutting-edge technologies and engineering methods that help us innovate at all times. We value professionals who seek constant learning without restricting their knowledge to a single technology. We are looking for an SRE (Site Reliability Engineer) that is in line with the technological stack we use. If you don't have full adherence, don't worry, we like the diversity of knowledge and you can certainly learn a lot from our team.
As part of the team as an SRE, you will be responsible for availability, performance, monitoring, and incident response.
How we work today:
- Product-oriented multidisciplinary teams;
- Agile methodology with TDD and BDD practices;
- Scalable, microservices-oriented architecture;
- Microservices written in GoLang, Python, Kotlin, Node.js, and React;
- High processing capacity with intercommunication using GRPC / Kafka;
- Continuous Integration and Deployment on AWS and Google Cloud;
- Automation using GIT, Jenkins, Spinnaker, Ansible, and Terraform;
- Data persistence in DynamoDb and DataStore;
- Monitoring based on “observability” techniques using Prometheus;
- Immutable infrastructure with containers managed via Kubernetes.
Responsibilities:
- Define executive ownership of GCP services and define a responsibility assignment matrix (RACI);
- Establish access management control policies and centralised user access management;
- Implement a robust onboarding and offboarding process with the integration of JiraSM access request & approval workflows;
- Implement periodical user access review procedures and IAM roles and permissions using GCP Access Recommender insights that will allow the assurance of PoLP;
- Clean up legacy and redundant GCP projects;
- Standardised baseline for new project creation to avoid the recurrent proliferation of projects created when using App Scripts;
- Develop a standardised guideline for creating new projects, encompassing naming conventions, resource allocations, and configurations;
- Remove excessive admin privilege;
- Introduce a comprehensive RBAC system within GCP, ensuring users and service accounts are granted permissions strictly based on their roles and responsibilities following the Principle of Least Privilege (PoLP);
- Implement a folder-based structure to categorise projects based on functions, departments, or other relevant criteria and to define fine-grained level of permissions;
- Establish management policies and procedures for key creation, issuance and deactivation;
- Create and maintain up-to-date inventory of all existing keys with description, ownership, and classification/criticality;
- Design and implement monitoring controls of access to keys and its related functions;
- Regularly rotate service account keys and credentials;
- Implement versioning and access control for service account keys - fine-grain control;
- Introduce BigQuery row-level access;
- Implement data classification in BigQuery: tagging policy, criticality, sensibility;
- Implement data control in BigQuery: monitor new & unused created datasets, apply retention for objects and for data itself.
Requirements and qualifications:
- Advanced English (writing, reading and speaking) - Spanish is a plus;
- Analytical, resilient and multitasking profile;
- Master Linux operating systems;
- Aptitude for software programming (Infra as Code and Go Lang);
- Knowledge in “networking” (TCP/IP, Firewall, DNS, Routing, etc.);
- Operation of platforms in Cloud GCP (e.g. Function, Cloud SQL, Buckets, GKE, Firestore, VPC, Cloud DNS, IAM, Instances) and/or AWS (e.g. S3, RDS, EC2, ECS, Lambda, EKS, DynamoDB, IAM, CloudFront, VPC, R53);
- CI/CD (Tools: GitLab CI, Github Actions, Jenkins, ArgoCD, Spinnaker. Practices: Build, test, push, deploy stages; CloudFormation. Practices: Modularization (single objective, versioning), reuse, testable, state management, security);
- Orchestrators and addons (K8S, EKS, GKE, Kops, Kubespray, Rancher, Helm, Nginx Ingress Controller or equivalent. Practices: Use of declarative specs (Deployment, Service, DaemonSet, StatefulSet, Ingress), Facilitators such as Kustomize and Helm Charts);
- Observability (Prometheus, Grafana, Jaeger, AlertManager, Kiali, ELK Stack, Loki, DataDog, Dynatrace, AppDynamics, CloudWatch, StackDriver, OpenTelemetry, New Relic. Practices: Metrics, Logging, Tracing).