Recently I needed to import some CSV data into Rails. There were quite a few records that I needed to import so I took the opportunity to explore options for speeding up this process.

I wanted to verify my assumption about how much time was being spent in the database so I opened up Stackprof to profile what was actually happening.

Here is my example rake task:

1000.times.each do |i|
  User.create(username: "username_#{i}")
end

Adding stackprof was as easy as adding it to my Gemfile and then wrapping the creation of the records in a call to Stackprof.run

StackProf.run(mode: :cpu, out: 'tmp/stackprof-cpu-myapp.dump', raw: true) do
  1000.times.each do |i|
    User.create(username: "username_#{i}")
  end
end

This generated a dump file that could be analyzed with some of the utilities Stackprof provides. I chose the commandline version (the flamegraph output options are pretty neat too!).

$ stackprof tmp/stackprof-cpu-myapp.dump --text
==================================
  Mode: cpu(1000)
  Samples: 1495 (0.07% miss rate)
  GC: 65 (4.35%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       435  (29.1%)         425  (28.4%)     Logger::LogDevice#write
       389  (26.0%)         389  (26.0%)     SQLite3::Statement#each
       355  (23.7%)         355  (23.7%)     SQLite3::ResultSet#next_hash
        41   (2.7%)          41   (2.7%)     (marking)
        23   (1.5%)          23   (1.5%)     (sweeping)
        22   (1.5%)          22   (1.5%)     #<Module:0x00007fd206293530>.storage_to_output
        20   (1.3%)          20   (1.3%)     ActiveRecord::LogSubscriber#ignored_callstack
        11   (0.7%)          11   (0.7%)     Module#delegate
        12   (0.8%)           9   (0.6%)     Listen::File.change
        31   (2.1%)           9   (0.6%)     Bootsnap::CompileCache::ISeq::InstructionSequenceMixin#load_iseq
         8   (0.5%)           8   (0.5%)     ActiveSupport::Notifications::Event#initialize
         7   (0.5%)           7   (0.5%)     ActiveRecord::LogSubscriber#sql_color
         6   (0.4%)           6   (0.4%)     SQLite3::Statement#bind_params
         6   (0.4%)           6   (0.4%)     Pathname#children
         6   (0.4%)           6   (0.4%)     ActiveSupport::Logger::SimpleFormatter#call
         6   (0.4%)           6   (0.4%)     ActiveModel::Attribute#initialize
         6   (0.4%)           6   (0.4%)     ActiveSupport::Dependencies::WatchStack#watch_namespaces
         6   (0.4%)           6   (0.4%)     MonitorMixin#mon_enter
         6   (0.4%)           6   (0.4%)     MonitorMixin#mon_enter
         5   (0.3%)           5   (0.3%)     Pathname#chop_basename
        34   (2.3%)           4   (0.3%)     Listen::Directory.scan
         4   (0.3%)           4   (0.3%)     ActiveSupport::Callbacks::ClassMethods#define_callbacks
         4   (0.3%)           4   (0.3%)     ActiveRecord::Base.logger
         4   (0.3%)           4   (0.3%)     Pathname#prepend_prefix
         4   (0.3%)           4   (0.3%)     Listen::Record::Entry#real_path
       361  (24.1%)           4   (0.3%)     SQLite3::Database#prepare
         6   (0.4%)           4   (0.3%)     ActiveRecord::ConnectionAdapters::Quoting#quoted_date
         4   (0.3%)           3   (0.2%)     ActiveRecord::Relation::QueryAttribute#value_for_database
         3   (0.2%)           3   (0.2%)     MonitorMixin#mon_try_enter
         3   (0.2%)           3   (0.2%)     #<Class:0x00007fd2064a5030>#__getobj__

Sure enough, my assumption that a significant amount of time was being used by the database was correct! I decided to do a quick google search or two and see if there where any other options…

After a little googling I found the activerecord-import gem. The gem basically works by combining all the sql insert statements so you’re not running n different insert statements. This is a comparison of Active Record’s standard #create method vs using the active-import gem.

Rake task:

Benchmark.bm do |x|
  x.report("#create:") do
    10_000.times.each do |i|
      User.create(username: "username_#{i}")
    end
  end
  x.report("activerecord-import:") do
    users = []
    10_000.times.each do |i|
      users << User.new(username: "username_#{i}")
    end
    User.import!(users)
  end
end

And the output:

       user     system      total        real
#create:  8.822378   5.608055  14.430433 ( 17.174048)
activerecord-import:  0.840474   0.020762   0.861236 (  0.865205)

An impressive improvement! The gem’s readme talks about various situtaions where you should avoid using the gem, which are worth a read. However if you know that your incoming data is good (as long as you don’t need any callbacks, yet another good reason to avoid callbacks…) then this gem looks like it’s well worth some investigation.