AWS ECS Windows introspection bug
2024-01-27
TL;DR: The following is just a copy-paste from the issue I opened on the Amazon ECS Agent GitHub repository.
The issue
Summary
I started noticing inconsistent behavior when trying to reach the EC2 machine’s ECS Agent from Linux containers and Windows containers through the containers’ respective gateways, via HTTP.
Description
I am using a mixed cluster with Windows and Linux instances on ECS EC2 using the ECS optimised images from AWS. I notice that Windows containers, in bridge network mode, cannot reach the machine’s ECS Agent via the container’s gateway on HTTP (unless the machine’s Windows Firewall is relaxed). It is not true on Linux instances where containers, in bridge network mode, can reach the machine’s ECS Agent via the container’s gateway on HTTP.
The Windows containers will be able to reach the machine’s ECS Agent if I explicitly add an allow rule to the Windows Firewall.
Expected Behavior
Both platforms should deny or allow traffic consistently (?)
Observed Behavior
From a Linux container:
/app # ip route
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 scope link src 172.17.0.7
/app # curl http://172.17.0.1:51678/
{"AvailableCommands":["/v1/metadata","/v1/tasks","/license"]}
/app #
From a Windows container:
PS C:\> route print
===========================================================================
Interface List
18...........................Software Loopback Interface 2
19...00 15 5d ca 12 db ......Hyper-V Virtual Ethernet Container Adapter
===========================================================================
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 172.24.160.1 172.24.172.125 5256
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
172.24.160.0 255.255.240.0 On-link 172.24.172.125 5256
172.24.172.125 255.255.255.255 On-link 172.24.172.125 5256
172.24.175.255 255.255.255.255 On-link 172.24.172.125 5256
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 172.24.172.125 5256
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 172.24.172.125 5256
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 172.24.160.1 Default
0.0.0.0 0.0.0.0 172.24.160.1 Default
===========================================================================
IPv6 Route Table
===========================================================================
Active Routes:
If Metric Network Destination Gateway
18 331 ::1/128 On-link
19 5256 fe80::/64 On-link
19 5256 fe80::4447:b1a9:a5af:28d/128
On-link
18 331 ff00::/8 On-link
19 5256 ff00::/8 On-link
===========================================================================
Persistent Routes:
None
PS C:\> curl http://172.24.160.1:51678/
curl: (28) Failed to connect to 172.24.160.1 port 51678 after 21026 ms: Couldn't connect to server
Environment Details
Windows:
ECS Agent version: "Version":"Amazon ECS Agent - v1.74.1 (a23f2935)"
AMI: Windows_Server-2022-English-Core-ECS_Optimized-2023.08.09
Instance type: t3a.2xlarge
Linux:
ECS Agent version: "Version":"AmazonECS Agent - v1.75.0 (*e978160b)"}
AMI: amzn2-ami-ecs-hvm-2.0.20230809-x86_64-ebs
Instance type: t3a.large
The fix (from AWS engineering)
The fix for this issue was released with November 2023 AMIs.
During instance bootstrap, you can set ECSAllowOffHostIntrospectionAccess
switch for the Initialize-ECSAgent
method. Alternatively, you can also set ECS_ALLOW_OFFHOST_INTROSPECTION_ACCESS
env to true for enabling access to introspection API. This env is documented here- https://github.com/aws/amazon-ecs-agent#environment-variables
The new user data would be similar to-
<powershell>
Import-Module ECSTools
[Environment]::SetEnvironmentVariable(“ECS_LOGLEVEL_ON_INSTANCE”,”debug", “Machine”)
Initialize-ECSAgent -Cluster 'windows' -EnableTaskENI -EnableTaskIAMRole -AwsvpcBlockIMDS -ECSAllowOffHostIntrospectionAccess
</powershell>
or
<powershell>
Import-Module ECSTools
[Environment]::SetEnvironmentVariable(“ECS_LOGLEVEL_ON_INSTANCE”,”debug", “Machine”)
[Environment]::SetEnvironmentVariable(“ECS_ALLOW_OFFHOST_INTROSPECTION_ACCESS”,”true", “Machine”)
Initialize-ECSAgent -Cluster 'windows' -EnableTaskENI -EnableTaskIAMRole -AwsvpcBlockIMDS
</powershell>