Speed up ActiveRecord data import
Recently I needed to import some CSV data into Rails. There were quite a few records that I needed to import so I took the opportunity to explore options for speeding up this process.
I wanted to verify my assumption about how much time was being spent in the database so I opened up Stackprof to profile what was actually happening.
Here is my example rake task:
1000.times.each do |i|
User.create(username: "username_#{i}")
end
Adding stackprof
was as easy as adding it to my Gemfile
and then wrapping the creation of the records in a call to Stackprof.run
StackProf.run(mode: :cpu, out: 'tmp/stackprof-cpu-myapp.dump', raw: true) do
1000.times.each do |i|
User.create(username: "username_#{i}")
end
end
This generated a dump file that could be analyzed with some of the utilities Stackprof provides. I chose the commandline version (the flamegraph output options are pretty neat too!).
$ stackprof tmp/stackprof-cpu-myapp.dump --text
==================================
Mode: cpu(1000)
Samples: 1495 (0.07% miss rate)
GC: 65 (4.35%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
435 (29.1%) 425 (28.4%) Logger::LogDevice#write
389 (26.0%) 389 (26.0%) SQLite3::Statement#each
355 (23.7%) 355 (23.7%) SQLite3::ResultSet#next_hash
41 (2.7%) 41 (2.7%) (marking)
23 (1.5%) 23 (1.5%) (sweeping)
22 (1.5%) 22 (1.5%) #<Module:0x00007fd206293530>.storage_to_output
20 (1.3%) 20 (1.3%) ActiveRecord::LogSubscriber#ignored_callstack
11 (0.7%) 11 (0.7%) Module#delegate
12 (0.8%) 9 (0.6%) Listen::File.change
31 (2.1%) 9 (0.6%) Bootsnap::CompileCache::ISeq::InstructionSequenceMixin#load_iseq
8 (0.5%) 8 (0.5%) ActiveSupport::Notifications::Event#initialize
7 (0.5%) 7 (0.5%) ActiveRecord::LogSubscriber#sql_color
6 (0.4%) 6 (0.4%) SQLite3::Statement#bind_params
6 (0.4%) 6 (0.4%) Pathname#children
6 (0.4%) 6 (0.4%) ActiveSupport::Logger::SimpleFormatter#call
6 (0.4%) 6 (0.4%) ActiveModel::Attribute#initialize
6 (0.4%) 6 (0.4%) ActiveSupport::Dependencies::WatchStack#watch_namespaces
6 (0.4%) 6 (0.4%) MonitorMixin#mon_enter
6 (0.4%) 6 (0.4%) MonitorMixin#mon_enter
5 (0.3%) 5 (0.3%) Pathname#chop_basename
34 (2.3%) 4 (0.3%) Listen::Directory.scan
4 (0.3%) 4 (0.3%) ActiveSupport::Callbacks::ClassMethods#define_callbacks
4 (0.3%) 4 (0.3%) ActiveRecord::Base.logger
4 (0.3%) 4 (0.3%) Pathname#prepend_prefix
4 (0.3%) 4 (0.3%) Listen::Record::Entry#real_path
361 (24.1%) 4 (0.3%) SQLite3::Database#prepare
6 (0.4%) 4 (0.3%) ActiveRecord::ConnectionAdapters::Quoting#quoted_date
4 (0.3%) 3 (0.2%) ActiveRecord::Relation::QueryAttribute#value_for_database
3 (0.2%) 3 (0.2%) MonitorMixin#mon_try_enter
3 (0.2%) 3 (0.2%) #<Class:0x00007fd2064a5030>#__getobj__
Sure enough, my assumption that a significant amount of time was being used by the database was correct! I decided to do a quick google search or two and see if there where any other options…
After a little googling I found the activerecord-import gem. The gem basically works by combining all the sql insert statements so you’re not running n
different insert statements. This is a comparison of Active Record’s standard #create
method vs using the active-import
gem.
Rake task:
Benchmark.bm do |x|
x.report("#create:") do
10_000.times.each do |i|
User.create(username: "username_#{i}")
end
end
x.report("activerecord-import:") do
users = []
10_000.times.each do |i|
users << User.new(username: "username_#{i}")
end
User.import!(users)
end
end
And the output:
user system total real
#create: 8.822378 5.608055 14.430433 ( 17.174048)
activerecord-import: 0.840474 0.020762 0.861236 ( 0.865205)
An impressive improvement! The gem’s readme talks about various situtaions where you should avoid using the gem, which are worth a read. However if you know that your incoming data is good (as long as you don’t need any callbacks, yet another good reason to avoid callbacks…) then this gem looks like it’s well worth some investigation.