DOC-2975: Added new page 'Load from Apache Iceberg' [4.3] #875

Tushar-TG-14 · 2025-10-09T19:44:17Z

No description provided.

pingxieTG

Overall lgtm. But there are some partials that can be referenced and reused, e.g., https://github.com/tigergraph/server-docs/blob/4.2/modules/data-loading/pages/load-from-cloud.adoc?plain=1#L32

victorleeTG · 2025-10-27T13:57:13Z

modules/data-loading/pages/load-from-iceberg.adoc

+:toclevels: 4
 = Load from Apache Iceberg
+
+In version *4.3*, TigerGraph introduces a connector to load data from an *Apache Iceberg*. This connector allows you to load data stored in *AWS S3* or *MinIO* buckets that are managed by an *Iceberg REST Catalog*.


Suggested change

In version *4.3*, TigerGraph introduces a connector to load data from an *Apache Iceberg*. This connector allows you to load data stored in *AWS S3* or *MinIO* buckets that are managed by an *Iceberg REST Catalog*.

TigerGraph 4.3 adds *Apache Iceberg* to its collection of high-speed built-in connectors. This connector allows you to load data stored in *AWS S3* or *MinIO* buckets that are managed by an *Iceberg REST Catalog*.

Reasons for the change:

We want to emphasize that we are expanded an existing family. not adding something that is totally new. Writing our documentation as though everything is new is one reason why our documentation is difficult to read: it's written like a large number of individual features, rather than as a smooth and logical fabric.

Avoid talking about TigerGraph the company. Talk about TigerGraph DB the product. Notice how the meaning of "TigerGraph" is different in the two versions. Is this were marketing material, talking about the company would be fine, but this is technical documentation, not marketing.

We do not say "an Iceberg", just like we do not say "a Spark".

victorleeTG · 2025-10-27T13:58:14Z

modules/data-loading/pages/load-from-iceberg.adoc

+
+This guide shows how to connect your data source, create a loading job, and manage it effectively.
+
+== Build Your Graph Foundation


Suggested change

== Build Your Graph Foundation

== Build Your Graph Schema

Both TigerGraph and Iceberg use the term "schema". "Foundation" is not a standard term for either product, so why introduce a new concept that isn't needed?

victorleeTG · 2025-10-27T14:02:54Z

modules/data-loading/pages/load-from-iceberg.adoc

+  aws.s3.access_key: admin,
+  aws.s3.secret_key: password,
+  aws.client.region: us-east-1,
+  tasks.max: 2


In our current documentation, tasks.max is a filename parameter and can only be configured when defining a FILENAME.
Is the ability to define tasks.max in a DATA SOURCE a new or old feature? Does it apply to all connectors or just Iceberg?
Does it apply to all filename parameters or just one (or a few)? Which ones?

This is the problem with documenting by "example". You actually create more questions than you answer. The better approach is to briefly explain the feature (or option) and then show an example. Or, show an example and then explain it. At some point, provide all the important details for what is/isn't supported.

victorleeTG · 2025-10-27T14:03:47Z

modules/data-loading/pages/load-from-iceberg.adoc

+  LOAD f1 TO VERTEX person VALUES ($0, $1, $2);
+}
+
+- *8 Tasks from Job*: Use 8 tasks for larger data.


Why these numbers, 2 and 8?

1 is the default, right? Is there a global max?

victorleeTG · 2025-10-27T14:04:18Z

modules/data-loading/pages/load-from-iceberg.adoc

+
+[source,gsql]
+CREATE LOADING JOB loadSocialNet FOR GRAPH socialNet {
+  DEFINE FILENAME f1 = "$s1:SELECT personId, id, gender FROM iceberg_connector.person WHERE gender = 'male'";


Is iceberg_connector a built-in name or this a user-defined name? If it was user defined, where/when would it be defined?

victorleeTG · 2025-10-27T14:06:52Z

modules/data-loading/pages/load-from-iceberg.adoc

+
+Create a loading job to turn Iceberg data into your graph’s vertices and edges. It involves defining data sources and mapping the data.
+
+=== Quick Loading Job Example


You should include the Iceberg schema. A user cannot really learn from a loading example if you are only showing the schema of one half (either the source schema or the target graph schema). We need to see both schemas.

victorleeTG · 2025-10-27T14:07:12Z

modules/data-loading/pages/load-from-iceberg.adoc

+CREATE DATA_SOURCE s1 = """
+{
+    type: iceberg,
+    iceberg.catalog.type: rest,
+    iceberg.catalog.uri: http://rest:8181,
+    aws.s3.endpoint: http://minio:9000,
+    aws.s3.access_key: accesskey,
+    aws.s3.secret_key: password,
+    aws.client.region: us-east-1
+}""" FOR GRAPH socialNet


Other TigerGraph connectors let you put this configuration JSON in a file. I assume that is also supported here?

If this is in a file, I suspect the triple quotes are omitted, but I'm not sure.

@pingxieTG Could you please clarify this point?

Yes it supports JSON file as well. It is no different from the way other data sources are created. @Tushar-TG-14

victorleeTG · 2025-10-27T14:11:05Z

modules/data-loading/pages/load-from-iceberg.adoc

+  LOAD f1 TO VERTEX person VALUES ($0, $1, $2);
+}
+
+== Define Your Data Files


This section confused me because it was repeating things that we're shown in the examples above.
After reading further, I eventually realized that you were breaking down the full loading process, step-by-step, which was all lumped together in the examples before.

Please add introductory or transitional sentences, and some sort of main heading like "Connector Setup and Loading - Step by Step" to tell the user how you are leading them. Otherwise, the reader doesn't understand how the sections are related to one another.

victorleeTG · 2025-10-27T14:11:46Z

modules/data-loading/pages/load-from-iceberg.adoc

+[source,gsql]
+DEFINE FILENAME query_person = "$s1:SELECT personId, id, gender FROM iceberg_connector.person";
+DEFINE FILENAME bq_inline_json = "$s1:myfile.json";
+DEFINE FILENAME query_person = "$s1:{query: 'SELECT personId, id, gender FROM iceberg_connector.person WHERE gender = 'male'', tasks.max: 2}";


This example uses a a file object name query_person which was already used above. It should be changed to be unique.

victorleeTG · 2025-10-27T14:23:34Z

modules/data-loading/pages/load-from-iceberg.adoc

+
+== Connect Your Data Source
+
+Configure a data source object to connect TigerGraph to your Apache Iceberg storage (S3 or MinIO). This involves specifying connection details using JSON.


"Using JSON" is very vague. If you say that, I expect some explanation pretty quickly; otherwise you leave me hanging. So, it's actually better no to say that yet.

The structure of the existing data connection documentation is logical; it just needs to be streamlined.
The logic:

Create a Data Source Object (instead of Connect Your Data Source)
** Specify configuration parameters in a JSON object
*** use a small JSON example to show the format and how it can be specified either inline or in a separate file
*** show the tables of all the configuration parameters

DOC-2975: Added new page 'Load from Apache Iceberg' [4.3]

57aae30

Tushar-TG-14 requested a review from pingxieTG October 13, 2025 15:28

pingxieTG reviewed Oct 22, 2025

View reviewed changes

victorleeTG requested changes Oct 27, 2025

View reviewed changes

	In version 4.3, TigerGraph introduces a connector to load data from an Apache Iceberg. This connector allows you to load data stored in AWS S3 or MinIO buckets that are managed by an Iceberg REST Catalog.
	TigerGraph 4.3 adds Apache Iceberg to its collection of high-speed built-in connectors. This connector allows you to load data stored in AWS S3 or MinIO buckets that are managed by an Iceberg REST Catalog.


		This guide shows how to connect your data source, create a loading job, and manage it effectively.

		== Build Your Graph Foundation


		Create a loading job to turn Iceberg data into your graph’s vertices and edges. It involves defining data sources and mapping the data.

		=== Quick Loading Job Example


		== Connect Your Data Source

		Configure a data source object to connect TigerGraph to your Apache Iceberg storage (S3 or MinIO). This involves specifying connection details using JSON.

DOC-2975: Added new page 'Load from Apache Iceberg' [4.3] #875

Are you sure you want to change the base?

DOC-2975: Added new page 'Load from Apache Iceberg' [4.3] #875

Conversation

Tushar-TG-14 commented Oct 9, 2025

Uh oh!

pingxieTG left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants