GEO-OPEN-HACK-2024 is a comprehensive and informative event designed for advanced geo-coders to explore various open tools and approaches for upscaling geospatial analysis on open High-Performance Computing (HPC) infrastructure.
The event is organised by the International Institute of Apllied System Analysis (IIASA) in collaboration with Spatial Ecology. This hackathon delves into advanced cutting-edge open techniques, tools, and best practices for efficiently handling and processing vast amounts of geospatial data. Participants will gain hands-on experience in leveraging HPC resources and geo-tools for tasks such as geospatial data preprocessing, spatial modeling and analytics, and visualization.
- Introduction to Big Geospatial Data: Understanding the challenges and opportunities presented by large-scale geospatial datasets.
- High-Performance Computing Basics: Familiarization with HPC systems, queuing system, parallel processing, and optimization techniques
- Open Tools and Workflows: Techniques and tools for geospatial data processing and spatial analytics for applications like remote sensing, GIS, and environmental change monitoring.
- Modern Geo-analytics: Exploring emerging trends and technologies in the field, such as machine learning and cloud-based geospatial analytics and visualization.
- Parallel Computing: Harnessing the power of parallel and distributed computing for speed and efficiency for geospatial analysis.
- Performance Tuning: Strategies to optimize ML models and workflows for HPC environments.
- Case Studies: Real-world examples of successful big geospatial data projects on HPC systems.
- Scalability and Big Data Challenges: Addressing issues related to data volume, velocity, variety, and veracity in geospatial analysis.
The hackathon will start with an icebreaker/get-together event on Sunday evening (17.00-20.00h). On Wednesday evening a social event will be organized. For more details on topics covered in the morning sessions, please click below:
Monday: Geo-Processing with HPC
- VRT format for tile splitting and tile re-aggregation
- Embedding Python and R in Bash.
- HPC architecture and queue systems
- Geo-Processing with Slurm queue system
- Single and multiple core processing using a simple job
- Single and multiple core processing using many simple jobs
- Single and multiple core processing using job-array
- Working with Slurm using GRASS
- Building MAPSET on the RAM
- Using GRASS commands that incorporate multi core processing
Tuesday: Geo-python with HPC
- Geospatial data processing with pyjeo
- Introduction to the pyjeo data model
- Combining pyjeo with other open source libraries
- Machine learning applied to geospatial data (optional)
- Feature extraction based on labeled data
- Model training based on labeled data
- Model prediction on raster dataset
- Access data cubes from a STAC compliant catalog (optional)
- Upscaling workflows: high throughput computing (HTC)
- Processing on a cluster with tiled images
- Parallel and distributed computing with Dask
Wednesday: ML with HPC
- ML-python: Sckit-learn/Pytorch on a supercomputer/HPC (MLPClassifier, MLPRegressor, etc.)
- Training a CNN model on the cluster.
- Multicore processing and GPU usage.
- Using and finetuning large models for satellite image datasets via HuggingFace
OpenEO is a standardized API for cloud computing in EO. This training will teach the concepts behind openEO and how to use it to carry out a typical EO workflow on a cloud platform.
- The concepts of openEO
- Cloud computing in EO - why standards are needed
- Where can you find cloud platforms that support openEO
- OpenEO in action
- Learn how to use openEO for cloud processing
- Create your own openEO workflow to solve a typical EO question.
The Pangeo training will be delivered as a mix of presentations, hands-on exercises, group activities and group discussions (including showcasing successful projects based on the Pangeo ecosystem).
- Unlocking Pangeo for Geospatial Data Science
- Introduction to the Pangeo and its significance in big geospatial analysis
- Different ways to setting up a Pangeo environment for geospatial applications
- Accessing and loading large geospatial datasets using Pangeo
- Understanding data storage options & best practices
- Scalable geospatial analysis with Pangeo
- Leveraging Pangeo for scalable geospatial analysis
- Exploring parallel computing capabilities of Pangeo using Dask and Xarray
- Applying machine learning models to geospatial data within the Pangeo environment
This is an advanced-level hackathon, ideal for early-career researchers, scientists, and professionals interested in unlocking the full potential of big geospatial data by harnessing the computational power of open HPC systems. Participants will leave with valuable insights and practical skills to tackle their geospatial challenges at scale.
The hackathon is aimed at individuals with masters or doctoral qualifications. It will provide them an opportunity to scale-up their own spatio-temporal modeling and data analysis projects. Hackathon participants should have intermediate bash and Python skills, basic ML-python knowledge, and a strong desire to learn command line tools for massive geo data processes. R users are also welcome if they feel comfortable with command line operations in bash and python environments.
Basic concepts of GIS, such as familiarity with rasters/vectors, overlays, buffering etc, and basics of statistics, such as mean standard, deviation, residuals, as well as python and bash syntax will be assumed as given. We will share tips and tricks on massive data processing, and therefore a strong bash and Python base will be essential grasping nuanced concepts that can boost your geo-analysis.
A developer committee will evaluate the pre-registration assessment and select 25 applicants on one of the following criteria:
- Applicant’s git repository (Github, Gitlab, Bitbucket) for an evaluation. The following will be assessed:
- Use of Python code and bash (including GDAL) for txt and geo-data analysis. Laptop-executed scripts are sufficient, i.e. no need to demonstrate code for cluster processes.
- Use of scikit-learn python (or similar library) for ML applications, in other words we want people that think in a multidimensional way.
- Any evidence of bash and Python used by advanced R programmers, e.g. Git repositories, coding samples or relevant course certificates etc. Moreover fundamentals of ML are also needed.
- Completion of the “Geocomputation & Machine Learning for environmental applications” course and workshop organized by Spatial Ecology
- On-site registration fee 200 Euro
- Online registration fee 50 Euro
This is a not-for-profit event, and registration fees serve only to cover hospitality costs (social events, lunches, coffee breaks, etc.). Accommodation and travel costs are the responsibility of the participants.
Online participants will be able to ask questions and receive live assistance by a course instructor. However, troubleshooting will be limited due to logistical constraints. Lectures will be recorded and made available for asynchronous viewing, allowing participants in distant time zones to participate as well. Links to the recordings will be shared only with those who register as online attendees.
To register, all participants must:
- Complete the skills assessment and the pre-registration form here.
- Wait for our approval before proceeding with fee payment.
- Pay the registration fee.
You will be considered registered only when we have received your completed pre-registration form, and we have sent you a confirmation email, and the registration fee has been paid in full.
The pre-registration form will be open until March 15. 2024.
Payment information: the payment options and deadlines will be communicated here soon.
Refund policy: A written request for registration cancellation must be e-mailed to the hackathon organizers. If registration cancellation is requested by April 1. 2024, the full registration fee will be refunded. If a cancellation request is submitted after April 1. 2024, we will retain 80% of the fees unless a candidate from the waiting list is able to replace the withdrawing candidate.
Visa requirements: On-site participants are responsible for ensuring they have the correct documentation to enter Austria. Please check your visa requirements and visa issuance waiting times, ensuring that you are able to travel to Austria on a Schengen visa in time for June 2024. Check here for countries that need a Schengen visa. Please note that the hackathon organizers are not authorized to assist with the VISA process beyond providing the Invitation Letter.
The hackathon logistics information can be found in this document.
Teachers and supervisors
Scientific Advisory Panel
GEO-OPEN-HACK-2024 is an initiative under the Open-Earth-Monitor Cyberinfrastructure (OEMC) project aiming to lower the barrier and transfer knowledge to users dealing with big geospatial data analytics. The OEMC has received funding from the European Union's Horizon Europe research and innovation programme (grant agreement No. 101059548). GEO-OPEN-HACK-2024 is also an initiative in the framework of NSF-funded POSE project TI-2303651: Growing GRASS OSE for Worldwide Access to Multidisciplinary Geospatial Analytics.
University Museum of Natural History, Oxford
Lahore University of Management Sciences (LUMS), Pakistan
Venice, Italy and online
Grand Gongda Jianguo Hotel (No.100 Pingleyuan, Chaoyang District, Beijing, China)