Automating Server Discovery in a Hybrid Cloud Migration

As part of our organization’s hybrid cloud adoption, we are gradually moving non-critical workloads from GCP to our on-premises environment. During this transition, one of the key requirements was to gather detailed information about applications running on each server — including the tech stack and runtime details.

What initially looked like a simple task quickly exposed a classic scalability problem.

🎯 The Challenge

We needed to collect application and tech stack details from multiple GCP virtual machines.

❌ The old (manual) approach

SSH into each server individually
Execute a shell script manually
Copy and consolidate outputs
Repeat for every VM

This approach was:

Time-consuming
Error-prone
Not scalable
Operationally inefficient

With a growing fleet, manual work was clearly not sustainable.

💡 The DevOps Solution

To eliminate repetitive manual effort, I automated the entire workflow using Ansible.

Key improvements implemented:

✅ Centralized execution from Ansible control node
✅ Passwordless SSH using GCP project metadata
✅ Fleet-wide script execution
✅ Parallel server processing
✅ Foundation for future configuration management

🔐 Step 1 — Enabled Secure, Scalable Access

Instead of configuring SSH access VM by VM, I followed the GCP best practice.

What I did:

Generated SSH key on the Ansible master

Added the public key to:

GCP → Compute Engine → Metadata → SSH Keys

🚀 Why this matters

By adding the key at the project metadata level:

The key propagates automatically to all VMs in the project
No need for per-VM key distribution
Future servers inherit access automatically
Enables true fleet management

This was a major efficiency win.

🧾 Step 2 — Built Scalable Ansible Inventory

I organized all servers under a common group in /etc/ansible/hosts, allowing centralized management.

This ensured:

Clean structure
Easy expansion
Environment-wide consistency

⚡ Step 3 — Automated Shell Script Execution

Previously, engineers had to log in to each machine and run the discovery script manually.

With Ansible, the same script can now be executed across the fleet in parallel from a single command.

Benefits achieved:

⏱️ Massive time savings
🔁 Repeatable execution
📉 Reduced human error
📈 Better operational visibility
🔧 Ready for future automation

📊 Real-World Outcome

During validation:

Running VMs responded successfully
Stopped VMs showed as unreachable (expected behavior)
SSH access worked seamlessly via metadata keys
Fleet connectivity was verified using Ansible ping

The environment is now ready for large-scale configuration management.

🧠 Key Learnings

🔹 Hybrid cloud migrations expose manual process gaps
🔹 Metadata-based SSH is the correct approach in GCP
🔹 Ansible dramatically reduces operational overhead
🔹 Inventory design matters for scalability
🔹 Parallel execution is critical in real environments
🔹 Automation should be built with future growth in mind

🚀 What’s Next

Next steps in this journey include:

Automating application inventory collection
Building reusable Ansible playbooks
Implementing dynamic inventory for GCP
Extending automation to on-prem servers
Standardizing configuration management

🏁 Final Thoughts

This exercise reinforced an important DevOps principle:

If you have to do it more than twice, automate it.

By replacing manual SSH work with Ansible-driven automation and project-level SSH key management, we now have a scalable foundation that supports both our current GCP environment and the upcoming hybrid cloud model.

Automating Server Discovery in a Hybrid Cloud Migration — A Junior DevOps Journey

🎯 The Challenge

❌ The old (manual) approach

💡 The DevOps Solution

Key improvements implemented:

🔐 Step 1 — Enabled Secure, Scalable Access

What I did:

🚀 Why this matters

🧾 Step 2 — Built Scalable Ansible Inventory

⚡ Step 3 — Automated Shell Script Execution

Benefits achieved:

📊 Real-World Outcome

🧠 Key Learnings

🚀 What’s Next

🏁 Final Thoughts

Comments

More from this blog

Day 16: AWS IAM User Management with Terraform – CSV-Driven Onboarding, RBAC, Password Security, PGP Encryption, and SSO Best Practices

How I Reduced a Next.js Docker Image from 3.39 GB to 619 MB (82% Reduction)

Day-15 Building a Full-Mesh Multi-Region VPC Peering Architecture Using Terraform on AWS

Day 14 — Static Website Hosting on AWS with S3 + CloudFront using Terraform

How I Debugged an "Undeletable" AWS Elastic IP and Traced It Back to Redshift Serverless

Command Palette

🎯 The Challenge

❌ The old (manual) approach

💡 The DevOps Solution

Key improvements implemented:

🔐 Step 1 — Enabled Secure, Scalable Access

What I did:

🚀 Why this matters

🧾 Step 2 — Built Scalable Ansible Inventory

⚡ Step 3 — Automated Shell Script Execution

Benefits achieved:

📊 Real-World Outcome

🧠 Key Learnings

🚀 What’s Next

🏁 Final Thoughts

Comments

More from this blog