Generate random data in an Access database in C#

random data

This example shows how you can generate random data for students and their test scores in an Access database. I often build examples that manipulate databases. Those examples are more interesting if the database holds lots of data, but it’s a hassle entering lots of data by hand. This example shows how you can generate random data that includes students with addresses and test scores. You can use similar techniques to generate other kinds of data such as employee, customer, or purchase order data.

The following code prepares the application when its form loads.

using System.Data.OleDb;
...
// The database connection.
private OleDbConnection Conn;

// Names.
private string[] LastNames = { "Ballard", "Horton", "Jordan", ... };
private string[] FirstNames = { "Tasha", "Erma", "Maryann", ... };
private string[] StreetNames = { "Amber", "Auburn", "Bent", ... };
private string[] StreetTypes = { "Acres", "Alcove", "Arbor", ... };
private string[] CityNames = { "Sitka", "Juneau", "Wrangell", ... };
private string[] StateNames = { "AL", "MT", "AK", "NE", "AZ", ... };

// Set up the connection.
private void Form1_Load(object sender, EventArgs e)
{
    string connect_string =
        "Provider=Microsoft.ACE.OLEDB.12.0;" +
        "Data Source=Students.mdb;" +
        "Mode=Share Deny None";
    Conn = new OleDbConnection(connect_string);
}

The code includes a using statement to let it use the Data.OleDb clases.

Before any methods begin, the code defines the OleDbConnection object that will be used to interact with the database. It also defines string arrays that hold values that will be used to generate random data.

The form’s Load event handler defines a database connection string and uses it to create the OleDbConnection object. (Note that this doesn’t open the database. It just tells the connection object what string to use when it opens the database later.)

To use the program, enter the number of students you want to create and the number of test scores they should each have. Then click the Create button. The code that creates the necessary records is simple but fairly long, so I’ll describe it in pieces.

The following code removes any existing records from the database.

// Create random students.
private void btnCreate_Click(object sender, EventArgs e)
{
    Conn.Open();

    // Empty the tables.
    using (OleDbCommand scores_cmd =
        new OleDbCommand("DELETE FROM TestScores", Conn))
    {
        int num_scores_deleted = scores_cmd.ExecuteNonQuery();
        Console.WriteLine("Deleted " + num_scores_deleted +
            " TestScores records");
    }

    using (OleDbCommand delete_addr_cmd =
        new OleDbCommand("DELETE FROM Addresses", Conn))
    {
        int num_addresses_deleted =
            delete_addr_cmd.ExecuteNonQuery();
        Console.WriteLine("Deleted " + num_addresses_deleted +
            " Addresses records");
    }

The program first opens the database connection. It then uses OleDbCommand objects to execute SQL statements that empty the TestScores and Students tables.

Generate random students and test scores
The code deletes the TestScores records first for an important reason. As you can see in the picture of Access on the right, the database has one foreign key relationship between Students.StudentId and TestScores.StudentId. What that means is the TestScores table isn’t allowed to hold any records where the StudentId value isn’t already present in the Students table. For right now, that means you can’t delete a Students record if there are corresponding TestScores records.

For each of these two tables, the program creates an SQL DELETE statement and uses an OleDbCommand object to execute it. The ExecuteNonQuery method returns the number of records affected by the command. The program displays the number just to give you an idea of what it’s doing.

(Note: One of the worst mistakes to make when writing SQL DELETE statements is to leave out the WHERE clause. If there’s no WHERE clause, as in this example, the statement deletes every record from the table with no chance to undo. In this example, that’s what I want to happen, but whenever you write a DELETE or UPDATE statement, make sure you have the correct WHERE clause or you may trash your database.)

The following code shows how the program prepares to create new students and test scores.

int num_students = int.Parse(txtNumStudents.Text);
int num_tests = int.Parse(txtNumTestScores.Text);

// Make OleDbCommand objects to create students and test scores.
string create_student_sql = "INSERT INTO Addresses " +
        "(FirstName, LastName, Street, City, State, Zip) " +
        " VALUES (?, ?, ?, ?, ?, ?)";
using (OleDbCommand create_student_cmd =
    new OleDbCommand(create_student_sql, Conn))
{
    string create_score_sql = "INSERT INTO TestScores " +
        "(StudentId, TestNumber, Score) VALUES (?, ?, ?)";
    using (OleDbCommand create_score_cmd =
        new OleDbCommand(create_score_sql, Conn))
    {

The earlier code deletes the records from the Addresses and TestScores tables only once. When it creates the new students and their test scores, it performs the same commands many times with different parameter values. You could compose a new SQL statement for each new record, but the process is more efficient if you reuse the same command objects with different parameters. The code shown above creates OleDbCommand objects to create new Addresses and TestScores records. It marks the missing parameter values with question marks.

Next the program enters a loop to create the Addresses records.

// Create students.
Random rand = new Random();
List<string> names = new List<string>();
for (int i = 0; i < num_students; i++)
{
    // Pick the i-th random name.
    string first_name, last_name;
    do
    {
        first_name = FirstNames[rand.Next(0, FirstNames.Length)];
        last_name = LastNames[rand.Next(0, LastNames.Length)];
        if (!names.Contains(last_name + ", " + first_name))
            names.Add(last_name + ", " + first_name);
    }
    while (names.Count <= i);

For each student, the code enters a do while loop to pick a random first and last name. The biggest trick here is making sure you don’t pick the same combination of names twice.

Normally to pick N out of M things, you can simply randomize the M things and then pick the first N of them. In this case, however, we need to pick combinations from two lists instead of just items from one list.

One approach would be to generate all of the possible combinations of names and then randomize them. This example has 50 first names and 50 last names, so there are 2,500 possible combinations. That’s few enough that this approach would work, but it would be pretty inefficient.

A second approach would be to simply try to add each randomly generated combination to the database and let the database complain if there was a duplicate. (I didn’t actually make this database prohibit duplicates, but you could.) Unfortunately database accesses are relatively slow, so it’s usually better to figure these things out if you can and avoid the trip to the database.

This example takes a third approach. It builds a List of the names it generates and only uses a new name if it isn’t already on the list. With 2,500 possible combinations, the chance of a duplicate isn’t very high so we won’t get duplicates very often. (Unless you want to generate a really large number of students. In that case, you should probably add more first and last names to the arrays.)

(A more realistic strategy might be to allow duplicate names and use some method to tell them apart, such as the StudentId. After all, some people do have identical names. That’s not necessary for this example, though, so I won’t bother.)

Anyway, the program creates a List of names. It then enters a loop to create the needed students. For each student, it uses a do while loop to continue generating names until it finds one that isn’t already in the list. It adds the new name to the list and continues to generate random street, city, state, and ZIP code information for that name. (Note that this is all random. There’s no reason to believe that the selected city, state, and ZIP code are realistic. You could make them a bit more realistic by creating lists of real city/state/ZIP combinations, but that would be a lot more work and not worth the effort for random data.)

The next step is to create the new Addresses record for the newly generated name.

// Set the command's name parameters.
create_student_cmd.Parameters.Clear();
.Parameters.AddWithValue(
    "FirstName", first_name);
create_student_cmd.Parameters.AddWithValue(
    "LastName", last_name);

// Set the other parameter values.
create_student_cmd.Parameters.AddWithValue(
    "Street",
    rand.Next(100, 99999).ToString() +
    " " + StreetNames[rand.Next(0, StreetNames.Length)] +
    " " + StreetTypes[rand.Next(0, StreetTypes.Length)]);
create_student_cmd.Parameters.AddWithValue(
    "City",
    CityNames[rand.Next(0, CityNames.Length)]);
create_student_cmd.Parameters.AddWithValue(
    "State",
    StateNames[rand.Next(0, StateNames.Length)]);
create_student_cmd.Parameters.AddWithValue(
    "Zip",
    rand.Next(10000, 99999).ToString());

// Create the student.
create_student_cmd.ExecuteNonQuery();

This code removes any previous parameters from the create_student_cmd object and adds the values for the newly selected first and last names. It then adds the remaining values needed by the INSERT statement: Street, City, State, and Zip. Next the code executes the command to create the record.

The Students table’s StudentId field is auto-generated, so the program uses the following code to get the StudentId for the data it just added to the database.

// Get the new student's auto-generated ID.
OleDbCommand get_id_cmd =
    new OleDbCommand("SELECT @@IDENTITY", Conn);
int student_id;
using (OleDbDataReader reader = get_id_cmd.ExecuteReader())
{
    reader.Read();
    student_id = (int)reader.GetValue(0);
}

Next the program enters another loop to create the new student’s records in the TestScores table.

                // Create the new student's test scores.
                for (int test_number = 0;
                    test_number < num_tests; test_number++)
                {
                    // Set the new test score command parameters.
                    create_score_cmd.Parameters.Clear();
                    create_score_cmd.Parameters.AddWithValue(
                        "StudentId", student_id);
                    create_score_cmd.Parameters.AddWithValue(
                        "TestNumber", test_number);
                    create_score_cmd.Parameters.AddWithValue(
                        "Score", rand.Next(50, 101));

                    // Execute the command.
                    create_score_cmd.ExecuteNonQuery();
                }
            }
        }
    }

    // Close the connection.
    Conn.Close();

    MessageBox.Show("Created " + num_students + " students and " +
        num_students * num_tests + " test score records.");
}

The approach is similar to the one used to create the Addresses record: fill in the OleDbCommand object’s parameters and execute.

The program finishes by closing the connection and displaying the number of records created.

Download the example to see the source all in one piece and to see the program in action.

If you need to generate some other kind of information, you’ll need to modify the code a bit, but the basic approach should work. Create an OleDbCommand object to execute the INSERT statement, replace the values with question marks, and use a loop to generate the records. For each record, set the appropriate parameter values and then execute the command.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in ADO.NET, algorithms, database and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *