Running into issues when trying to seed large data set with Prisma, seeking alternatives options



prisma/1.30.1 (darwin-x64) node-v8.12.0

I have started to run into issues when seeding my project using a single seed.graphql file. In my case, I have started parsing some GPX map data files into seed.graphql. The largest of these GPX files have about 27,000 objects, and I have about 70 GPX files. Naturally, this results in a huge seed.graphql file and the seeding process fails after many minutes of processing.

My first issue was dealing with node running out of memory, but that was easily solved with this:

export NODE_OPTIONS="--max-old-space-size=4096"

The current prisma seed error I am seeing for the full data set is:

Seeding based on seed.graphql !
 ▸    Error while parsing
 ▸    /Users/.../seed.graphql:
 ▸    Error while executing operation:
 ▸    request to http://localhost:4466/ failed, reason: socket hang up

The issue seems to be due to Java running out of resources in the Prisma container. Here is the Docker log for my container running image prismagraphql/prisma:1.30.1:

 ☺  docker logs -f server_prisma_1                                                                                      
No log level set, defaulting to INFO.
[INFO] Initializing workers...
[INFO] Obtaining exclusive agent lock...
[INFO] Obtaining exclusive agent lock... Successful.
[INFO] Successfully started 1 workers.
[INFO] Deployment worker initialization complete.
Server running on :4466
[INFO] {} - Started.
[Warning] Management authentication is disabled. Enable it in your Prisma config to secure your server.
Uncaught error from thread []: GC overhead limit exceeded, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[single-server]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at scala.collection.immutable.VectorPointer.gotoNextBlockStartWritable(Vector.scala:830)
	at scala.collection.immutable.VectorPointer.gotoNextBlockStartWritable$(Vector.scala:827)
	at scala.collection.immutable.VectorBuilder.gotoNextBlockStartWritable(Vector.scala:635)
	at scala.collection.immutable.VectorBuilder.$plus$eq(Vector.scala:649)
	at scala.collection.immutable.VectorBuilder.$plus$eq(Vector.scala:635)
	at scala.collection.IterableLike.takeWhile(IterableLike.scala:159)
	at scala.collection.IterableLike.takeWhile$(IterableLike.scala:153)
	at scala.collection.AbstractIterable.takeWhile(Iterable.scala:54)
	at sangria.parser.PositionTracking.trackPos(PositionTracking.scala:12)
	at sangria.parser.PositionTracking.trackPos$(PositionTracking.scala:10)
	at sangria.parser.QueryParser.trackPos(QueryParser.scala:420)
	at sangria.parser.Tokens.BlockStringValue(QueryParser.scala:56)
	at sangria.parser.Tokens.BlockStringValue$(QueryParser.scala:55)
	at sangria.parser.QueryParser.BlockStringValue(QueryParser.scala:420)
	at sangria.parser.Tokens.StringValue(QueryParser.scala:53)
	at sangria.parser.Tokens.StringValue$(QueryParser.scala:53)
	at sangria.parser.QueryParser.StringValue(QueryParser.scala:420)
	at sangria.parser.Values.Value(QueryParser.scala:354)
	at sangria.parser.Values.Value$(QueryParser.scala:351)
	at sangria.parser.QueryParser.Value(QueryParser.scala:420)
	at sangria.parser.Values.ObjectField(QueryParser.scala:387)
	at sangria.parser.Values.ObjectField$(QueryParser.scala:387)
	at sangria.parser.QueryParser.ObjectField(QueryParser.scala:420)
	at sangria.parser.Values.rec$114(QueryParser.scala:383)
	at sangria.parser.Values.ObjectValue(QueryParser.scala:383)
	at sangria.parser.Values.ObjectValue$(QueryParser.scala:383)
	at sangria.parser.QueryParser.ObjectValue(QueryParser.scala:420)
	at sangria.parser.Values.Value(QueryParser.scala:359)
	at sangria.parser.Values.Value$(QueryParser.scala:351)
	at sangria.parser.QueryParser.Value(QueryParser.scala:420)
	at sangria.parser.Values.rec$112(QueryParser.scala:379)
	at sangria.parser.Values.ListValue(QueryParser.scala:379)

I have been reducing the number of rows of data to see how many I can get away with. Splicing the array of GPX data points down too 4,000 rows creates a 28.7 MB seed file which is successfully processed after 408.5s. However, 10,000 rows will cause the Prisma Docker container to run out of resources and crash.

Clearly a monolithic seed.prisma file isn’t the best way to tackle this issue, but I was hoping someone could point me in the right direction to get this data into Prisma. Would utilizing a seed.js file be better for importing each GPX files data or should I consider another alternative?