Download images from a web page in C#

[Download images]

This is a big program and this description only touches on its most interesting parts. Download it to see the details.

The program displays a WebBrowser control on the left. Click the links on the WebBrowser to navigate to a Web page or enter a URL in the TextBox above it and click the Go button to navigate there.

When you have found the web page that you want, click the List Images button to make the program display all of the page’s images on the right. The program limits each image to at most 300×300 pixels so none of them takes up too much room.

You can then click on images to remove them from the list. After you’ve narrowed the choices down to those you like, click Save Images to save the remaining images to your hard disk.

The following code shows how the program navigates.

// Navigate to the entered URL.
private void btnGo_Click(object sender, EventArgs e)
{
    try
    {
        wbrWebSite.Navigate(txtUrl.Text);
    }
    catch (Exception ex)
    {
        MessageBox.Show("Error navigating to web site " +
            txtUrl.Text + '\n' + ex.Message,
            "Navigation Error",
            MessageBoxButtons.OK,
            MessageBoxIcon.Error);
    }
}

The following code shows how the program lists the images on the web page.

// Show the images from the URL.
private bool Running = false;
private void btnListImages_Click(object sender, EventArgs e)
{
    if (btnListImages.Text == "List Images")
    {
        this.Cursor = Cursors.WaitCursor;
        btnListImages.Text = "Stop";
        btnGo.Enabled = false;
        btnSaveImages.Enabled = false;
        Application.DoEvents();

        // Remove old images.
        for (int i = flpPictures.Controls.Count - 1; i >= 0; i--)
        {
            flpPictures.Controls[i].Parent = null;
        }

        // List the images on this page.
        HtmlDocument doc = wbrWebSite.Document;
        Running = true;
        foreach (HtmlElement element in doc.Images)
        {
            mshtml.HTMLImg dom_element =
                (mshtml.HTMLImg)element.DomElement;
            string src = dom_element.src;

            PictureBox pic = new PictureBox();
            pic.BorderStyle = BorderStyle.Fixed3D;
            pic.Image = GetPicture(src);
            SetPictureBoxSize(pic);
            pic.Parent = flpPictures;
            pic.Tag = src;
            tipFileName.SetToolTip(pic, src);

            pic.Click += pic_Click;

            Application.DoEvents();

            if (!Running) break;
        }
        Running = false;

        btnListImages.Text = "List Images";
        btnGo.Enabled = true;
        btnSaveImages.Enabled = true;
        this.Cursor = Cursors.Default;
    }
    else
    {
        Running = false;
    }
}

The program removes all controls from the flpPictures FlowLayoutPanel control by setting their Parent properties to null. This removes all references to those controls so they are destroyed the next time garbage collection runs.

Next the code gets the WebBrowser control’s Document property, which returns an HtmlDocument object representing the Web page, and loops through the document’s Images collection. It gets each image object’s src property, which contains the image’s URL.

The code makes a new PictureBox, calls subroutine GetPicture to download the image into the PictureBox, and places the PictureBox in the FlowLayoutPanel control. That control automatically arranges its children in rows, wrapping when necessary, and displaying scroll bars if the pictures don’t all fit. Notice that the code saves the image’s URL in the PictureBox control’s Tag property for later use.

Finally the code registers the pic_Click event handler to catch the PictureBox control’s Click event.

This routine also contains code to let you see new PictureBox controls as they are created and to stop the loop if necessary before it finishes. See the code for details.

The GetPicture function shown in the following code downloads a picture and returns it.

// Get the picture at a given URL.
private Image GetPicture(string url)
{
    try
    {
        url = url.Trim();
        if (!url.ToLower().StartsWith("http://"))
            url = "http://" + url;
        WebClient web_client = new WebClient();
        MemoryStream image_stream =
            new MemoryStream(web_client.DownloadData(url));
        return Image.FromStream(image_stream);
    }
    catch (Exception ex)
    {
        MessageBox.Show("Error downloading picture " +
            url + '\n' + ex.Message,
            "Download Error",
            MessageBoxButtons.OK,
            MessageBoxIcon.Error);
    }
    return null;
}

This code uses a WebClient to download a picture. It calls the WebClient object’s DownloadData method to pull the image down into a memory stream. It then uses the Image class’s FromStream method to convert the stream into an image.

When you click on a picture, the following code removes it from the list.

// Remove the clicked PictureBox.
private void pic_Click(object sender, EventArgs e)
{
    PictureBox pic = sender as PictureBox;
    pic.Parent = null;
}

The code simply sets the clicked PictureBox control’s Parent property to null, which removes it from the FlowLayoutPanel. The FlowLayoutPanel automatically rearranges its children as needed.

Finally when you click the Save button, the following code saves the pictures that are still in the FlowLayoutPanel.

// Save the images that have not been removed.
private void btnSaveImages_Click(object sender, EventArgs e)
{
    Cursor = Cursors.WaitCursor;

    // Get the directory path and make sure the directory exists.
    string dir_name = txtDirectory.Text;
    if (!dir_name.EndsWith(@"\")) dir_name += @"\";
    Directory.CreateDirectory(dir_name);

    foreach (PictureBox pic in flpPictures.Controls)
    {
        Bitmap bm = (Bitmap)pic.Image;
        string filename = dir_name +
            Path.GetFileName(pic.Tag.ToString());
        SaveImage(bm, filename);
    }

    Cursor = Cursors.Default;
    System.Media.SystemSounds.Beep.Play();
}

The code loops through the PictureBox controls that remain in the FlowLayoutPanel. It gets each image’s file name from the PictureBox control’s Tag property and calls the SaveImage method to save the control’s image in a file of the appropriate type.

Download the example program to see additional details.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in graphics, internet, web and tagged , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *