This is going to be a rather long post, since I want to explain why I implemented certain styles of coding. The main goal is to see if I can make this program run faster with help from you guys!!
Overview
Starting off, I have 23 different Excel Sheets that sit in a file. Some sheets are bigger than others but I have to import ALL of the data within each of them. Each of the column header names in these Excel Sheets have been created as classes in C# - there are 23 of these classes. In order to grab the data, there is a front-end that accepts an Excel file and then ExcelDataReader
is used to read the tables in a way that they can become DataTable
objects.
There is a general style that importing each Excel Sheet follows. I will use my vCPU class as an example First, the program cleans up the DataTable
object that is being looked at by removing unnecessary rows. Then a List<vCPUs>
object is made. Each row in the vCPU Excel Sheet makes up one vCPU
object, so there is a foreach
loop that goes through each DataRow
in the vCPU_Table
. That looks something like this:
public class vCPU
{
public static int Insert(DataTable vCPU_Table, int assessment_id, int objectCount)
{
//Cleans up the Data Table that is being accessed by removing unneccesary rows
for (int i = vCPU_Table.Rows.Count - 1; i >= 0; i--)
{
if (vCPU_Table.Rows[i][0].ToString() == String.Empty)
{
vCPU_Table.Rows.RemoveAt(i);
}
}
List<RvtoolsVCpu> vCPUs = new List<RvtoolsVCpu>();
using (var context = new DataWarehouseContext())
{
foreach (DataRow dr in vCPU_Table.Rows)
{
if (dr["VM"].ToString() == "") //In case too many rows are selected, the name is always filled for corresponding data
{
context.RvtoolsVCpu.AddRange(vCPUs);
context.SaveChanges();
return objectCount;
}
int VmId = vInfo.GetID(dr["VM"].ToString(), dr["Host"].ToString(), dr["Datacenter"].ToString(), dr["Cluster"].ToString(), assessment_id);
int? Amount = (dr.Table.Columns.Contains("CPUs") && dr["CPUs"] != DBNull.Value) ? Convert.ToInt32(dr["CPUs"]) : (int?)null; //Amount is NumCPUs
int? Sockets = (dr.Table.Columns.Contains("Sockets") && dr["Sockets"] != DBNull.Value) ? Convert.ToInt32(dr["Sockets"]) : (int?)null;
int? CoresPerSocket = (dr.Table.Columns.Contains("Cores p/s") && dr["Cores p/s"] != DBNull.Value) ? Convert.ToInt32(dr["Cores p/s"]) : (int?)null;
int? Max = (dr.Table.Columns.Contains("Max") && dr["Max"] != DBNull.Value) ? Convert.ToInt32(dr["Max"]) : (int?)null;
int? Overall = (dr.Table.Columns.Contains("Overall") && dr["Overall"] != DBNull.Value) ? Convert.ToInt32(dr["Overall"]) : (int?)null;
string Level = (dr.Table.Columns.Contains("Level") && dr["Level"] != DBNull.Value) ? dr["Level"].ToString() : "";
int? Shares = (dr.Table.Columns.Contains("Shares") && dr["Shares"] != DBNull.Value) ? Convert.ToInt32(dr["Shares"]) : (int?)null;
int? Reservation = (dr.Table.Columns.Contains("Reservation") && dr["Reservation"] != DBNull.Value) ? Convert.ToInt32(dr["Reservation"]) : (int?)null;
int? Entitlement = (dr.Table.Columns.Contains("Entitlement") && dr["Entitlement"] != DBNull.Value) ? Convert.ToInt32(dr["Entitlement"]) : (int?)null;
int? DrsEntitlement = (dr.Table.Columns.Contains("DRS Entitlement") && dr["DRS Entitlement"] != DBNull.Value) ? Convert.ToInt32(dr["DRS Entitlement"]) : (int?)null;
int? Limit = (dr.Table.Columns.Contains("Limit") && dr["Limit"] != DBNull.Value) ? Convert.ToInt32(dr["Limit"]) : (int?)null;
bool? HotAdd = (dr.Table.Columns.Contains("Hot Add") && dr["Hot Add"] != DBNull.Value) ? bool.Parse(dr["Hot Add"].ToString()) : (bool?)null;
bool? HotRemove = (dr.Table.Columns.Contains("Hot Remove") && dr["Hot Remove"] != DBNull.Value) ? bool.Parse(dr["Hot Remove"].ToString()) : (bool?)null;
string Annotation = (dr.Table.Columns.Contains("Annotation") && dr["Annotation"] != DBNull.Value) ? dr["Annotation"].ToString() : "";
vCPUs.Add(new RvtoolsVCpu()
{
VmId = VmId, //Get VM_ID
Amount = Amount,
Sockets = Sockets,
CoresPerSocket = CoresPerSocket,
Max = Max,
Overall = Overall,
Level = Level,
Shares = Shares,
Reservation = Reservation,
Entitlement = Entitlement,
DrsEntitlement = DrsEntitlement,
Limit = Limit,
HotAdd = HotAdd,
HotRemove = HotRemove,
Annotation = Annotation,
AssessmentId = assessment_id
});
objectCount += 16;
}
context.RvtoolsVCpu.AddRange(vCPUs);
context.SaveChanges();
return objectCount;
}
}
}
This vCPU class is rather small compared to some of the other Excel Sheets that have to be imported... you might be able to see now why I'm searching for optimization.
Within the foreach
loop, you can ignore the vInfo.GetID( ... );
since that is already optimized. Looking at the objects within each row, however, is where things get bulky. Here's why I have what I have in there, but any suggestions for better solutions would be greatly appreciated:
dr.Table.Columns.Contains("CPUs")
-- This is used since there are different versions of these Excel files. Some of them don't have the same columns, so when ExcelDataReader
tries to access a column that isn't there, the program breaks and the import fails.
dr["CPUs"] != DBNull.Value
-- This is used since sometimes there is nothing in a specific Excel cell, and that tends to run into an error when you try to convert nothing to an integer.
Thus, if both of these conditionals return true, the data cell is converted to the correct object type. Otherwise, it is converted to the null value of that object type. This works really great for avoiding errors! Unfortunately, it doesn't allow the program to chug along fast.
After each object in a row is either converted or set to null, Entity Framework Core
comes into play. The List<vCPUs>
object keeps gaining objects until the end of the Excel Sheet has been reached. All of the data is then pushed to a SQL Server Database and the program moves onto the next Excel Sheet.
Where can I improve on this program?
question from:
https://stackoverflow.com/questions/65830909/moving-excel-data-to-sql-server-can-this-be-more-efficient-c-sharp