aidp-object-storage — OCI Object Storage native (oci://)
Read and write Object Storage data directly from Spark. The AIDP cluster's IAM identity is used automatically — no OCI_CONFIG, no API keys, no inline PEM.
When to use
- Land or read CSV / Parquet / JSON / Avro / Delta files in an OCI bucket from Spark.
- Register an External Volume (
/Volumes/...) backed by an OCI bucket. - Define an External Table (
USING CSV/PARQUET/...) over anoci://path. - Mentioned: "oci://", "Object Storage bucket", "external volume", "external table".
When NOT to use
- For Iceberg tables on OCI Object Storage → use
aidp-iceberg. - For AWS S3 → use
aidp-aws-s3. - For Azure ADLS Gen2 → use
aidp-azure-adls.
URI form
oci://<bucket>@<namespace>/<path>
The namespace is the tenancy's Object Storage namespace (OCI Console > Object Storage > Bucket Details).
Direct read/write
oci_path = "oci://my-bucket@mynamespace/folder/file"
# Write
df.write.mode("overwrite").option("header", True).format("csv").save(oci_path)
# Read
df_read = spark.read.option("header", True).format("csv").load(oci_path)
df_read.show()
Same pattern with format("parquet"), format("json"), format("delta").
External Volume (DDL)
Mount a bucket once, reference by Volume path forever:
CREATE EXTERNAL VOLUME IF NOT EXISTS default.default.ext_volume
LOCATION 'oci://my-bucket@mynamespace/';
Then:
volume_path = "/Volumes/default/default/ext_volume/folder/file"
df.write.format("csv").option("header", True).save(volume_path)
spark.read.option("header", True).format("csv").load(volume_path).show()
Drop with DROP VOLUME default.default.ext_volume.
External Table (DDL)
Register a table whose data lives in oci://:
CREATE TABLE IF NOT EXISTS default.default.ext_table (name STRING, age INT)
USING CSV
OPTIONS (path='oci://my-bucket@mynamespace/folder/file', delimiter=',', header='true');
Query like any Spark table:
spark.sql("SELECT * FROM default.default.ext_table").show()
Drop with DROP TABLE default.default.ext_table.
Gotchas
- Auth is implicit — the AIDP cluster's IAM identity is used. The user never types OCI keys. If reads fail with 404/403, the workspace identity lacks bucket privileges; fix in OCI IAM.
- Namespace ≠ tenancy name. The Object Storage namespace is a separate, immutable string. Find it in
OCI Console > Profile > Tenancy: <tenancy_name>— theobject_storage_namespacefield. - External volume path is
/Volumes/<catalog>/<schema>/<volume>/..., NOToci://.... Once a volume is registered, address files via the Volume path. - External table
pathoption usesoci://directly, not the Volume path. Both work; choose based on whether you want a re-mountable abstraction (Volume) or a simple direct reference (Table). /Workspace/...is NOT for data. It's a FUSE-mounted file system intended for notebooks/configs. For data files useoci://or/Volumes/....