21 Apr 2013

Programmatically Download CORINE Land Cover Seamless Vector Data with R

Thanks to a helpful SO-Answer I was able to download all CLC vector data (43 zip-files) programmatically:
require(XML)

path_to_files <- "D:/GIS_DataBase/CorineLC/Seamless"
dir.create(path_to_files)
setwd(path_to_files)

doc <- htmlParse("http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2")
urls <- xpathSApply(doc,'//*/a[contains(@href,".zip/at_download/file")]/@href')

# function to get zip file names
get_zip_name <- function(x) unlist(strsplit(x, "/"))[grep(".zip", unlist(strsplit(x, "/")))]

# function to plug into sapply
dl_urls <- function(x) try(download.file(x, get_zip_name(x), mode = "wb"))

# download all zip-files
sapply(urls, dl_urls)

# function for unzipping
try_unzip <- function(x) try(unzip(x))

# unzip all files in dir and delete them afterwards
sapply(list.files(pattern = "*.zip"), try_unzip)

# unlink(list.files(pattern = "*.zip"))

2 comments :

  1. Thank you, this is really helpful!
    Although this works fine, I have problems when I want to use the newest CLC version. I replaced the link with “http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-3”, but now it does not find any urls.

    Karin

    ReplyDelete
    Replies
    1. Hi there,
      use:
      doc <- htmlParse("http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-3")
      urls <- xpathSApply(doc,'//*/a[contains(@href,".zip")]')
      Cheers,
      Kay

      Delete