This article dissects one of the most pragmatic solutions: mounting S3 buckets directly onto Kubernetes nodes using goofys, IAM roles, and DaemonSets, then exposing them to pods via hostPath volumes.
The approach I’ll detail is production-tested across workloads ranging from 500GB to 50TB+ datasets, handling everything from machine learning model registries to high-frequency data ingestion pipelines.
Key takeaway: This solution excels for unpredictable workload volumes and cost-sensitive scenarios, but introduces complexity and performance tradeoffs compared to AWS’s newer Mountpoint for S3 CSI driver.
The Storage Problem We’re Solving
Why Not Just Use S3 Directly?
Most applications already understand S3 APIs perfectly well. So why mount it as a filesystem?
Legacy Application Bind – Applications written in the 2010s expect POSIX file operations (open(), read(), write(), seek()). Retrofitting S3 SDK calls is often impractical.
Unpredictable Volume – Workloads that spike from 100GB to 5TB overnight (common in batch ML pipelines, data ingestion services). PersistentVolume provisioning + cost scales poorly; S3 billing is consumption-based.
Cost Arbitrage – S3 at scale is cheaper than EBS when you’re not hitting performance requirements that EBS enables. Our annual save: ~$40K by switching from 50TB of EBS to S3-backed mounts for archive data.
Multi-Node Data Sharing – Multiple pods across different nodes need read-only access to the same 2TB model registry. Syncing via S3 is simpler than NFS or EFS at this scale.
Kubernetes provides multiple options for mounting Amazon S3, as described in the comparison table below:
| Tool | POSIX Compliance | Performance | Complexity | Best For |
|---|---|---|---|---|
| s3fs-fuse | ~80% | Slow (100x slower for large files) | Medium | Archival, infrequent access |
| goofys | ~40% (but practical) | Fast (50-100x faster than s3fs) | Medium | High-throughput, unpredictable volumes |
| Mountpoint for S3 CSI | ~35% (optimized) | Very fast (native AWS, 2x faster than goofys for cache hits) | Medium-High | Enterprise, read-heavy ML/analytics |
| Rook + Ceph/EFS | 100% | Excellent | Very High | POSIX-critical, distributed systems |
Our approach uses goofys together with otomato gh s3 mounter, and the reasons for this choice are explained below.
The Architecture
The entire system’s security hinges on IAM role attachment. This is non-negotiable.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3BucketAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject*",
"s3:PutObject*",
"s3:DeleteObject*",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::",
"arn:aws:s3:::/*"
]
}
]
}
Instance Metadata Service (IMDS) is how EC2 instances fetch temporary credentials without storing permanent keys. goofys relies on IMDS.
The problem: On kubeadm clusters (unlike EKS), IMDS isn’t always correctly configured.
# Step 1: Check IMDS is accessible
curl -I http://169.254.169.254/latest/meta-data/
# If 401 Unauthorized appears, fix on the EC2 instance:
# AWS Console → EC2 → Instance Settings → Modify Instance Metadata Options
# Set: Metadata version = V1 and V2, Metadata tokens = Optional, Hop limit = 2
# Step 2: Verify IAM role is present
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Output should be your IAM role name, e.g., "s3-app-data-bucket-access"
# If empty or error: the IAM role isn't attached to the instance.
# Step 3: Verify credentials are obtainable
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/s3-app-data-bucket-access
# Output: JSON with AccessKeyId, SecretAccessKey, Token, Expiration
Why this matters: Without working IMDS, goofys fails silently. The DaemonSet pods will run but can’t authenticate, and logs show cryptic FUSE errors.
Implementation with otomato-gh/s3-mounter Helm Chart
helm repo add otomount https://otomato-gh.github.io/s3-mounter
helm repo update
# Inspect available values for your version
helm show values otomount/s3-otomount
The otomato-gh/s3-mounter Helm chart is specifically engineered for production Kubernetes deployments. It wraps goofys and handles:
DaemonSet creation with proper RBAC
SecurityContext configuration for FUSE access
Volume mount propagation setup
Resource requests/limits
Node affinity patterns
Health checks and restart policies
helm upgrade --install s3-mounter otomount/s3-otomount \
--namespace otomount-system \
--create-namespace \
--set bucketName= \
--set iamRoleARN=arn:aws:iam::703460697229:role/s3-app-data-bucket-access \
--set mountPath=/var/s3 \
--set region=us-east-1
What this creates (via otomato chart):
DaemonSet: One pod per node, running goofys process with proper SecurityContext
ServiceAccount: For RBAC (necessary on kubeadm; EKS also uses this)
ClusterRole / ClusterRoleBinding: Permissions for the DaemonSet
ConfigMap: Mount configuration and goofys startup script
Volume setup: Exposes /dev/fuse device to container, /var/s3 on host
The otomato chart typically sets mountPropagation: Bidirectional by default, but verify:
kubectl -n otomount-system get daemonset s3-mounter -o yaml | grep -A 5 volumeMounts
Output should show:
volumeMounts:
- mountPath: /var/s3
name: mntdatas3
mountPropagation: Bidirectional # ✓ This must be present
# Check DaemonSet rollout (should see "Ready" for all nodes)
kubectl -n otomount-system get daemonset s3-mounter
kubectl -n otomount-system get pods -l app=s3-mounter
# Verify mount on each node
kubectl -n otomount-system exec -it s3-mounter-xxxxx -- ls -la /var/s3
# Should show S3 bucket contents
# Example output:
# drwxr-xr-x - 12345 drwxr-xr-x 1024 Jan 20 2026 models/
# -rw-r--r-- 1.2G 12345 -rw-r--r-- 1.2G Jan 20 2026 dataset.tar.gz
# Verify goofys process is running
kubectl -n otomount-system exec -it s3-mounter-xxxxx -- ps aux | grep goofys
# Output: goofys app-data-meet /var/s3 -o allow_other,uid=0,gid=0 ...
# Check Helm release
helm list -n otomount-system
# NAME NAMESPACE REVISION STATUS CHART
# s3-mounter otomount-system 1 deployed s3-otomount-0.4.x
apiVersion: v1
kind: Pod
metadata:
name: data-consumer
spec:
containers:
- name: app
image: python:3.11
volumeMounts:
- name: s3-data
mountPath: /data
volumes:
- name: s3-data
hostPath:
path: /var/s3
type: Directory
Drawback: Pod is node-bound. If the node hosting your pod fails, the pod can’t migrate to another node (the mount won’t follow).
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-data-pv
spec:
capacity:
storage: 100Gi # Placeholder; actual size determined by S3 bucket
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
local:
path: /var/s3
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-1 # Restrict to this node
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: s3-data-pvc
namespace: data-pipeline
spec:
accessModes:
- ReadOnlyMany
storageClassName: "" # Required for local PV
resources:
requests:
storage: 100Gi
volumeName: s3-data-pv
---
apiVersion: v1
kind: Pod
metadata:
name: ml-training
namespace: data-pipeline
spec:
affinity:
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-1 # Must be on same node as PV
containers:
- name: trainer
image: pytorch:latest
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: s3-data-pvc
Advantage: Pods can migrate if scheduled to the same node, and storage quota is enforced.
S3 is fundamentally not a filesystem. goofys pragmatically accepts this, but problems emerge:
| Operation | Behavior | Impact |
|---|---|---|
rename() | Not atomic; implemented as copy+delete | Fails mid-flight; corrupts data on network errors |
chmod()/chown() | Silently ignored | Permission bits don’t persist |
seek() | Requires re-downloading entire object | 2TB file seek = re-download 2TB |
append() | Creates new object version | Doesn’t extend file; overwrites |
Recommendation:
Use goofys if: You’re on kubeadm, budget-constrained, or have simple read-mostly workloads.
Use Mountpoint S3 CSI if: You’re on EKS, need enterprise SLA, or have cache-heavy ML pipelines.
Mounting S3 on Kubernetes via goofys is pragmatic, and cost-effective for the right use cases. It’s not a silver bullet it trades POSIX compliance and performance for flexibility and cost.
The best solution is often the one that’s operationally simple and clearly understood by your team. goofys fits that description for organizations that value simplicity and cost over absolute performance or POSIX guarantees.
Start with this approach for read-mostly workloads and metadata-light applications. Graduate to EFS when you genuinely need filesystem semantics. And always, always test failure modes in staging first.
We use cookies to enhance your browsing experience, analyze traffic, and serve personalized marketing content. You can accept all cookies or manage your preferences.