Google Cloud Production Setup Best practices
Following are some of the best practices to follow for hardening the production topology. Please note, this may not be an exhaustive list and more details would be added based on cloud services being used.
VPC Network
- For compliance and audit, when you deal with sensitive regulated data, its best to isolate sensitive data in its own VPC network
- Use VPC security controls for finer controls around the network, sensitive data access and which services can be accessed.
- Use Internal IPs and private connections whenever possible. Use Cloud NAT and organization policies to further restrict which resources are allowed to use external IP addresses.
Cloud SQL
- Use private IPs only.
- Use Cloud SQL proxy (or container image) for secured and encrypted connection to database
GKE
- Use private instance of GKE and Google NAT for outbound access
- Define organization policies on image and use binary authorisation for images
- Keep your GKE version upto-date, Configure Node Auto-Upgrade for GKE nodes in pre-prod or equivalent environment before upgrading to production.
- For images, use a hardened container image, configure options like Shielded GKE nodes.
- Enable workload identity and don’t use the default service account. Use principles of least privilege for Google service accounts.
- Encrypt sensitive configuration data using secrets
- Based on your microservices topology and requirements, restrict network connectivity between PODs using policy.
- Clearly define namespaces and use RBAC to grant specific permission at the namespace level
Cloud Armor
- Evaluate and configure preconfigured rules for common exploits.
- Configure and retain the audit logs on cloud storage based on compliance requirements. Setup an automated lifecycle to copy the logs.
- Apply appropriate rate limits
Cloud Logs and Monitoring
- Ensure all logs are centrally logged and automated storage lifecycle to preserve the logs for compliance.
- Setup enough health monitoring for your application and infrastructure.
IAM and ACLs
- Evaluate all the roles and access provided to the services. Document and audit it as part of release and keep monitoring it.
- Use the principle of least privilege and strictly avoid Basic roles in production.
Google Cloud Storage
- Make sure google cloud buckets and objects have the right permission and visibility. Use principles of least privilege for granting access.
- Use required retention policy, versioning and bucket lock to ensure data regulations and compliance requirements.
- Used signed URLs to grant short term access to object to required access
General Connectivity to VMs
- Use Identity-Aware Proxy for users requiring remote connectivity (without VPN) based on IPs and corporate access policies
Lastly, build automated scripts, following the above practices for creating the environments.