Notifications
Clear all

How to insert 10000 records in DB without hampering performance

1 Posts
2 Users
0 Reactions
768 Views
0
Topic starter

There is an API that takes JSON object as input parameters and saves the model in the database after applying validation logic
and other custom logic. Having 10000 records, it would be a bad idea to call that API continuously.
I am thinking of chunking data into smaller units like 500 records and then saving each 500 records at once.

Are there any ways to save thousands of records?

This topic was modified 2 years ago by ravisuryawanshi
1 Answer
0

This question is a bit difficult to answer, because there are a lot of ways to write data in bulk, and a lot of tradeoffs in performance vs functionality vs time to implement.

I'm assuming right now you have an API like

  writeSomeModels(JSONArray):
    internally, converts JSONs to Model objects and calls ModelDataService.write

If you call writeSomeModels once per model (i.e. 10,000 arrays of size 1 each), that will perform slower than if you pass all records in (1 array of size 10,000) or if you pass multiple midsized arrays (20 arrays of size 500). So I definitely wouldn't recommend one at a time.

There's not a big difference between passing one giant array vs. multiple midsized arrays. This is because NEO automatically batches records into sub groups when doing the model write. (Sub batches are around 50 usually) So they will likely get similar performance.

However none of these approaches are going to be *super* fast. Our general experience is we get a processing rate of around 50 models per second, if there is almost no workflow and it's a single level model. Performance degrades as you add validation and child levels. So absolute best case for 10K records is probably 4-5 minutes. Maybe that's OK for a backend integration, but definitely not OK if an end-user has to wait on it. Also not good if it's a REST API that is blocking waiting for it to finish.

In absolute terms, the fastest way to load data is using a "SQL Loader" approach. In this approach (e.g. our Inbound Interfaces of type SQL), you write the incoming data into "staging" tables, then run a series of UPDATE and MERGE statements to validate the data and finally propagate it into the target table. This can be 10x-100x faster than regular Model writes, but at the expense of not being able to reuse all the workflow validation/etc logic which is so handy. If you're trying to process this data in seconds not minutes, you could use a strategy like that.