I have a legacy Spring Data JPA application that has a large number of Entities and CrudRepositories. JPA is configured using the XML below. We have a new requirement that requires us to insert 10,000 - 50,000 entities into the database at once via a FileUpload. With the existing configuration the database CPU spikes. After enabling hibernate statistics, its was apparent that these 10,000 insert operations were generating over 200,000 DB queries due to all the validation logic required with a single insert in the InvoiceService
.
Original Configuration
<bean id="dataSource" destroy-method="close" class="org.apache.commons.dbcp2.BasicDataSource">
<property name="driverClassName" value="${db.driver}"/>
<property name="url" value="${db.jdbcurl}"/>
<property name="username" value="${db.username}"/>
<property name="password" value="${db.password}"/>
<property name="maxTotal" value="${db.maxTotal}"/>
<property name="maxIdle" value="${db.maxIdle}"/>
<property name="minIdle" value="${db.minIdle}"/>
<property name="initialSize" value="${db.initialSize}"/>
<property name="maxWaitMillis" value="${db.maxWaitMillis}"/>
<property name="minEvictableIdleTimeMillis" value="${db.minEvictableIdleTimeMillis}"/>
<property name="timeBetweenEvictionRunsMillis" value="${db.timeBetweenEvictionRunsMillis}"/>
<property name="testOnBorrow" value="${db.testOnBorrow}"/>
<property name="testOnReturn" value="${db.testOnReturn}"/>
<property name="testWhileIdle" value="${db.testWhileIdle}"/>
<property name="removeAbandonedOnBorrow" value="${db.removeAbandonedOnBorrow}"/>
<property name="removeAbandonedOnMaintenance" value="${db.removeAbandonedOnMaintenance}"/>
<property name="removeAbandonedTimeout" value="${db.removeAbandonedTimeout}"/>
<property name="logAbandoned" value="${db.logAbandoned}"/>
</bean>
<bean id="entityManagerFactory"
class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" depends-on="flyway">
<property name="dataSource" ref="dataSource" />
<property name="packagesToScan" value="my.package.domain" />
<property name="loadTimeWeaver">
<bean class="org.springframework.instrument.classloading.InstrumentationLoadTimeWeaver" />
</property>
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
</property>
<property name="jpaProperties">
<props>
<prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto:validate}</prop>
<prop key="hibernate.dialect">org.hibernate.dialect.PostgreSQL9Dialect</prop>
<prop key="hibernate.show_sql">${hibernate.show_sql:false}</prop>
</props>
</property>
<property name="persistenceUnitName" value="entityManagerFactory" />
</bean>
<bean id="persistenceAnnotationBeanPostProcessor" class="org.springframework.orm.jpa.support.PersistenceAnnotationBeanPostProcessor">
<property name="defaultPersistenceUnitName" value="entityManagerFactory"/>
</bean>
<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>
<tx:annotation-driven proxy-target-class="true" />
<bean id="persistenceExceptionTranslationPostProcessor"
class="org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor" />
<jpa:repositories base-package="my.package.repository" entity-manager-factory-ref="entityManagerFactory"/>
The FileUploadService snippet looks like this...
EntityManager batchEntityManager = entityManagerFactory.createEntityManager();
EntityTransaction transaction = batchEntityManager.getTransaction();
transaction.begin();
try (BufferedReader buffer = new BufferedReader(new InputStreamReader(file.getInputStream()))) {
buffer.lines()
.filter(StringUtils::isNotBlank)
.forEach(csvLine -> {
invoiceService.createInvoice(csvLine);
if (counter.incrementAndGet() % (updateFrequency.equals(0) ? 1 : updateFrequency) == 0) {
FileUpload fileUpload1 = fileUploadRepository.findOne(fileUpload.getId());
fileUpload1.setNumberOfSentRecords(sentCount.get());
fileUploadRepository.save(fileUpload1);
transaction.commit();
transaction.begin();
batchEntityManager.clear();
}
});
transaction.commit();
} catch (IOException ex) {
systemException.incrementAndGet();
log.error("Unexpected error while performing send task.", ex);
transaction.rollback();
}
// Update FileUpload status.
FileUpload fileUpload1 = fileUploadRepository.findOne(fileUpload.getId());
fileUpload1.setNumberOfSentRecords(sentCount.get());
if (systemException.get() != 0) {
fileUpload1.setStatus(FileUploadStatus.SYSTEM_ERROR);
} else {
fileUpload1.setStatus(FileUploadStatus.SENT);
}
fileUploadRepository.save(fileUpload1);
batchEntityManager.close();
Most of the DB queries were select statements that return the same results for each record being inserted. It was obvious that enabling EhCache as a Second-Level cache would have a significant performance improvement. However, this application has been running flawlessly in production for several years without ehcache enabled. I am hesitant to turn this on globally as I do not know how this will affect the large number of other repositories/queries.
Question 1
Is there a way to configure a "alternate" EntityManagerFactory that uses the second level cache for this batch process only? How can I choose to use this factory instead of the primary for this batch process only?
I experimented adding something like below to my spring config. I can easily inject this additional EntityManager into my class and use it. However, the existing Repositories (such as FileUploadRepository) don't seem to use it - they just return null. I am not sure if this approach is possible. The documentation for JpaTransactionManager
says
This transaction manager is appropriate for applications that use a single JPA EntityManagerFactory for transactional data access
which is exactly what I am not doing. So what other options are there?
<bean id="batchEntityManagerFactory"
class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" depends-on="flyway">
<property name="dataSource" ref="dataSource" />
<property name="packagesToScan" value="my.package.domain" />
<property name="loadTimeWeaver">
<bean class="org.springframework.instrument.classloading.InstrumentationLoadTimeWeaver" />
</property>
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
</property>
<property name="jpaProperties">
<props>
<prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto:validate}</prop>
<prop key="hibernate.dialect">org.hibernate.dialect.PostgreSQL9Dialect</prop>
<prop key="hibernate.show_sql">${hibernate.show_sql:false}</prop>
<prop key="hibernate.generate_statistics">${hibernate.generate_statistics:false}</prop>
<prop key="hibernate.ejb.interceptor">my.package.HibernateStatistics</prop>
<prop key="hibernate.cache.use_query_cache">true</prop>
<prop key="hibernate.cache.use_second_level_cache">true</prop>
<prop key="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.EhCacheRegionFactory</prop>
<prop key="hibernate.jdbc.batch_size">100</prop>
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.order_updates">true</prop>
</props>
</property>
<property name="persistenceUnitName" value="entityManagerFactory" />
</bean>
Question 2
Assuming there is no other option to "selectively" use EhCache, I tried enabling it on the primary EntityManagerFactory only. We can certainly do regression testing to make sure we don't introduce new issues. I assume this is fairly safe to do? However, another issue came up. I am trying to commit inserts in batches as described in this post and shown in my code above. I am getting an RollbackException when trying to commit the batch due to Connection org.postgresql.jdbc.PgConnection@1e7eb804 is closed.
. I presume this is due to the maxWaitMillis
property on the dataSource.
I don't want to change this property for every other existing spring Service/Repository/query in the application. If I could use the "custom" EntityManagerFactory I could easily provide a different DataSource Bean with the properties I want. Again, is this possible?
Maybe I looking at this problem all wrong. Are there any other suggestions?