2021 European HUG Chat Log - The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies

2021 European HUG Chat Log

The following are the chat logs for each day’s session.

July 7, 2021

00:19:49 Lori Cooper: All attendees will enter the event muted. Unless the individual presenter says otherwise, please save your questions for the end of each talk, where there will be a 5 minute Q&A/break period. Feel free to unmute yourself and ask your question. Conference moderators will also keep an eye on the chat window and will make sure the presenter sees these questions as well. Everyone is welcome to continue the conversation on our forum at https://forum.hdfgroup.org.00:24:05 Kira Duwe: Hi, will the slides and/or talks be made available?
00:24:52 Kira Duwe: Great, thanks 🙂
00:25:11 Lori Cooper: https://www.hdfgroup.org/hug/europeanhug21/
00:29:48 Gerd Heber: https://support.hdfgroup.org/documentation/hdf5/latest/_s_p_e_c.html
00:31:20 Lionel Untereiner: Is there a place that list all these libraries in different languagues ?
00:33:54 Anderson Bestteti: Hello everyone, good morning/evening!
00:51:28 Raphaël GIRARDOT: Hello. Any idea of the impact on performances ?
00:51:59 Joshua Moore: Comment: This brings to mind the indexes which are available in PyTables. It seems like they would work great together.
00:52:28 Paul Millar: Do you happen to know how he’s planning to add HDFS support; for example, will files be downloaded into a filesystem-cache before being read?
00:52:38 Raphaël GIRARDOT: Thanks 🙂
00:52:39 Thomas Kluyver: Do you know how it uses multiple cores? If it’s built on the HDF5 library, and HDF5 can’t run on multiple threads – does it use MPI internally? Or is it making serial calls to HDF5 and only parallelising operations around that?
01:11:36 Joshua Moore: Wow, Lori. That’s _very_ effective.
01:20:31 Graeme Winter: Does this essentially allow embedding HDF5 compression plugins _into_ the HDF5 files? For large data files with special compression (e.g. bslz4) I could see value of this
01:26:00 Graeme Winter: In my use case the compiled plugin is ~ 4MB and data usually around 10GB -> well into the noise
01:27:07 Gerd Heber: It’s possible, but not really practical, because there is currently a single chunk restriction on UDF-based datasets. If you just wanted to store the binaries (shared libs) of the compression plugin in HDF5, you could use the HDF5 user block for that.
01:28:28 Graeme Winter: Ah, shame – support multiple platforms so that would be tricky – idea of embedding plugin would make supporting external users very easy
01:28:47 Graeme Winter: Unless storing multiple plugins (e.g. Linux, macos) in user block doabl
01:30:04 Graeme Winter: The thing I liked about the idea was having it auto loaded by hdf5 library – user block does not offer this (was hoping for “invisible decompression”)
01:30:37 Gerd Heber: It’s definitely doable in the user block. You’d have to keep track of the structure of the user block either in the user block itself (via a header) or in HDF5 via, e.g., attributes.
01:30:52 Gerd Heber: Understood
01:35:42 Nicholas Devenish: And the errors on reading data with missing plugins are very hard to detect and clean up for users
01:35:57 Nicholas Devenish: -> Another reason it would be nice to make problem go away
01:36:56 Elena Pourmal: THG now distributes plugins along with HDF5 source
01:37:06 Elena Pourmal: See our download pages
01:37:57 Graeme Winter: Users in the biological sciences are not the most technical -> in some packages we bundle everything but in the longer term making this future proof would be interesting
01:38:13 Elena Pourmal: https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.12/hdf5-1.12.0/plugins/
01:38:42 Elena Pourmal: Agree!
01:39:26 Katrina Exter: I’d be interested to know if anyone in the biological / genomics sciences uses HDF5
01:40:01 Graeme Winter: In X-ray diffraction (synchrotron, FEL) we uses it very extensively
01:40:13 Graeme Winter: TBs / day at Diamond Light Source at least
01:40:37 Jordi Andilla: We try to… we really have an upcoming dimensional problem
01:40:49 Graeme Winter: Will be touched on later today (Herbert B / gold standard) and my presentation tomorrow
01:40:58 Ken Ho: Yes, we use it to display time series 3D embryogenesis dataset
01:41:55 Ken Ho: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0237468
01:41:58 Katrina Exter: hmm, yes, I can see that when dealing with 3d data this is something that (even) biologists would be interested. the focus where I work is more on 1d and 2d data
01:42:12 Katrina Exter: (but will check out that paper)
01:42:26 Ken Ho: https://ssbd.riken.jp/database/
01:42:39 Ken Ho: You can find datasets all use HDF5 based format
01:43:06 Ken Ho: OMERO also uses HDF5 to store ROIs
01:43:07 Jordi Andilla: In our case we are dealing with more than 5D data. 3D+t+channel+[others]
01:44:46 Katrina Exter: for ROIs, yes, that also makes sense…
01:45:46 Joshua Moore | OME: CellH5 is an HDF5-based format that’s more targeted toward 2D (high-content) imaging. BDV and Imaris tend to be for 3D+.
01:47:03 Katrina Exter: seems to me could be interesting for storing genomics data (hold the raw sequences and then everything that comes from those as they are processed and analysed over and over again)
01:47:48 Mike Smith: There are several H5 based filed formats used in single-cell ‘omics for storing matrices of counts e.g. loom, 10x Genomics, Bioconductor HDF5Arrays
01:48:33 Joshua Moore | OME: +1 plus AnnData, etc.
01:51:37 Raphaël GIRARDOT: I’m not sure to understand. Did you completely rewrite the hdf5 data access api in order to avoid the problems (serialization and so on) ?
01:53:21 Raphaël GIRARDOT: Huge job, but great !
01:53:25 Mark Hedley: i have a question?
01:53:57 Thomas Kluyver: Very impressive! Are there any restrictions on what data types it can read (vlen, compound, etc.)?
01:54:00 Katrina Exter: interesting. I am totally new to HDF5 and one of my first thoughts was that a way to store data related to other data — raw data and derivatives from the raw data (quality controlled, processed, data products) — in one “unit”. If the metadata record how things were done, then you have full provenance along with the data.
01:54:21 Katrina Exter: anyway, interesting talks and comments, food 4 thought
01:56:21 Raphaël GIRARDOT: Any compatibility with java ?
01:56:23 Thomas Kluyver: Yes, absolutely! There’s a definite 80% of the benefit for 20% of the effort there 🙂
01:57:49 Thomas Kluyver: (that was referring to JP’s comment about focusing on a small core of HDF5 features that most use cases use)
01:58:20 Elena Pourmal: Noted about smaller “focused” File Format Spec
01:59:12 Gerd Heber: In the past we’ve talked about HDF5-A (like PDF-A). Maybe the two can be combined.
02:09:31 Mark Hedley: i have a comment
02:11:06 Ezequiel Cimadevilla: Is there any hdf5 vol implementation that allows to store chunks in the some way that zarr.storage.DirectoryStore does?
02:11:49 Joshua Moore | OME: https://forum.hdfgroup.org/u/josh.moore in case anyone wants to get in touch.
02:26:18 HUDER Loïc: https://lhtalks.gitlab-pages.esrf.fr/EHUG2021/
02:44:29 Nicholas Devenish: No questions, but it looks very good!
02:44:56 Joshua Moore | OME: Agreed! @Loïc: Is there a way to plugin in handlers for particular datasets? (e.g. higher dimensional images)
02:45:48 Peter Chang: Any plans for interactive callbacks to trigger backend processing?
02:45:49 Ken Ho: Sorry my Internet was not very stable. I have missed some part of your talk. How does this H5Web different from H5serve?
02:46:44 Lori Cooper: I will post the chat session somewhere (TBD) and sharing with presenters to see if we can get answers for everyone.
02:47:59 Joshua Moore | OME: @Loïc: an example would be a 3D browser that allows rotation & slicing.
02:48:05 HUDER Loïc: H5serv is to serve HDF5 file contents from a server. H5web can consume such contents to display/inspect the data
02:48:41 Joshua Moore | OME: Carrying on from Ken’s question, then the difference between h5grove and h5serv?
02:49:38 Andy Gotz: h5grove offers a higher level of abstraction so it can support h5serv as a backend but not vice versa
02:49:50 HUDER Loïc: @Josuha: I see ! No fixed plans yet but we are indeed thinking about 3D as WebGL is really the technology of choice for this.
02:50:52 Joshua Moore | OME: See https://github.com/hms-dbmi/viv/pull/415 if it can be of use. (React & Deck.gl) There’s also a related Jupyter plugin, etc. But not yet a browser like you have.
02:51:23 HUDER Loïc: Yes, I did not have the time to highlight this but h5grove is *not* a functional backend but rather some bits and pieces that ease the development of functional backends such as h5serv or jupyterlab_hdf
02:52:02 Joshua Moore | OME: Thanks, Andy. I will look into h5grove.
02:52:54 Joshua Moore | OME: (I’m imaging something similar to https://github.com/manzt/simple-zarr-server)
02:53:05 Joshua Moore | OME: Loïc: understood.
03:01:56 Joshua Moore | OME: Yes, my goal in life is to reach such a “Gold Standard”! Very inspiring.
03:02:19 Raphaël GIRARDOT: So there are 2 file formats. Do you use a single API to read them ?
03:02:33 Wout De Nolf: What does the “Gold Standard” do in addition to the NeXus standard?
03:04:20 Benjamin Watts: @Wout It is an implementation of NeXus to the specific needs of the MX community
03:05:25 Raphaël GIRARDOT: Going back to API : does it mean that every application that needs to read/access the files has to have the 2 specific format codes ?
03:06:11 Graeme Winter: As someone who supports analysis software, we have to support loads of formats
03:06:23 Raphaël GIRARDOT: thanks 🙂
03:06:31 Graeme Winter: Most data in CBF and HDF5 format
03:07:37 Joshua Moore | OME: @Herbert: if there’s a link to discussions on using explicit metadata for links I’d be interested to read more.
03:24:24 Nicholas Devenish: I don’t know if you are still talking but it’s gone silent for me?
03:24:30 Graeme Winter: The same
03:24:34 Lana abadie: me too
03:24:35 Wout De Nolf: Nothing here either
03:24:38 Andrea Lorenzon: same
03:24:39 Mike Folk: ditto
03:24:41 Paul Millar: It’s gone dead for me, too.
03:24:44 Ornela De Giacomo: the same here
03:24:44 Peter Chang: No video or audio
03:24:48 Raphaël GIRARDOT: same
03:24:49 Andy Gotz: I can hear still
03:24:56 Vijay Kartik: Yup same here, no audio/video
03:25:00 Herbert J Bernstein: I cannot hear.
03:25:00 Andrea Lorenzon: no video/audio for me too.
03:25:04 Benjamin Watts: dead for me
03:25:10 Andy Gotz: But image has frozen – can hear bells ringing …
03:25:11 Andrea Lorenzon: lori’s counter disappeared too
03:25:12 Samuel Debionne: Andy can you transcript for us?
03:25:21 Steven Varga: I can hear ther presenter… and many other vocies
03:25:22 Sandor Brockhauser: nothing ishere
03:25:23 Jordi Andilla: no audio, no video neither
03:25:27 Peter Chang: US link dead?
03:25:29 Jonathan Wright: No sound here …
03:25:30 Lori Cooper: Hang on everyone
03:25:34 Paul Millar: For me everything is frozen (Lori’s counter, too).
03:25:37 Lori Cooper: Vijay—are you able to see anything?
03:25:53 Vijay Kartik: just a frozen image where the clock says 2:15
03:26:00 Sandor Brockhauser: same
03:26:00 Elena Pourmal: Still Anthony’s screen for me
03:26:08 Elena Pourmal: Restart?
03:26:20 Sandor Brockhauser: OK
03:26:27 Nicholas Devenish: going out then in helped fix it
03:26:39 Vijay Kartik: I’ll do the same then
03:26:40 Nicholas Devenish: “Turn zoom off then on again”
03:26:42 Lori Cooper: Elena or Dax can you guys try going out then coming back in?
03:26:50 Elena Pourmal: I’ll try
03:26:55 Lori Cooper: Let me know, I can end and restart for everyone too
03:26:57 Nicholas Devenish: the screen sharing is blank but I can hear audio
03:27:01 Mike Folk: I went out and came back in. Now it’s fine.
03:27:10 Paul Millar: Reconnecting helped me.
03:27:28 Benjamin Watts: reconnect worked for me
03:27:30 Paul Millar: I can hear people.
03:27:35 Kira Duwe: reconnect worked for me too
03:27:36 Graeme Winter: Likewise reconnect but now black screen, but have sound
03:27:40 Nicholas Devenish: I can hear everyone not hearing each other talking to each other now
03:27:54 Graeme Winter: \o/ modern technology
03:27:56 Dax Rodriguez: Yes, if you are having issues please leave and rejoin.
03:27:57 Andrea Lorenzon: recconnection works, confirmed.
03:27:59 Peter Chang: A/v back after rejoining
03:28:16 Nicholas Devenish: the video took a minute or two to come back after rejoining for me but seems to be working now
03:28:37 Federico Wagner: reconnection worked +1
03:28:42 Lori Cooper: @Vijay – are you able to screenshare?
03:30:06 Vijay Kartik: @Lori – I haven’t tried yet, but I hope it works now that I have rejoined
03:30:35 Herbert J Bernstein: I went out and in and now have a good connection
03:31:19 Kira Duwe: Thanks for showing the links 🙂

03:43:34 Peter Chang: Can everyone mute their mic?
03:44:28 Elena Pourmal: Everyone is mutedd
03:48:12 Gerd Heber: :+1:
03:49:05 Elena Pourmal: +1
04:05:10 Kira Duwe: A lot of very interesting talks and discussion, thank you all!
04:07:20 Joshua Moore: @Andrea: in our case, there are many related but independent datasets (images) within a single HDF5. Having direct URLs to specific groups/datasets (or perhaps even regions of a dataset) is useful.
04:12:22 Andy Gotz: I would be interested to know what people are using as hdf5 viewers e.g. napari / imagej-fidji / hdfview / …
04:14:07 Vijay Kartik: I use silx when I know the files are small, hdfview when I want to quicklly check metadata, and lavue (from DESY) to load and plot 3-D datasets
04:14:40 Andy Gotz: @Vijay – thanks. We use silx too 😎
04:14:52 Andy Gotz: We also have imagej + matlab users
04:14:55 Vijay Kartik: I figured 🙂
04:14:55 Joshua Moore: Andy: an option in the ImageJ space is BigDataViewer (BDV) but it currently requires an additional XML file — https://imagej.net/plugins/bdv/ .
04:15:20 Vijay Kartik: Yes I think MATLAB is also quite popular at the experiments here

July 8, 2021

00:16:45 Lori Cooper: All attendees will enter the event muted. Unless the individual presenter says otherwise, please save your questions for the end of each talk, where there will be a 5 minute Q&A/break period. Feel free to unmute yourself and ask your question. Conference moderators will also keep an eye on the chat window and will make sure the presenter sees these questions as well. Everyone is welcome to continue the conversation on our forum at https://forum.hdfgroup.org.
00:33:51 Elena Pourmal: Doesn’t printf option work for you?
00:36:01 Thomas Kluyver: I think the %d filenames should work with h5py, though I’m not sure there are any tests for it – the filenames should just be passed through to HDF5.

00:36:38 Thomas Kluyver: Should work even with the high-level interface!
00:37:14 Zdenek Matej (MAX IV): General status comment: At MAX IV, SWMR is not used because for “detector” files are written via GPFS but scan files from the control system over NFS-service nodes of the GPFS filesystem. Then we have issues with some sw not supporting 1.10 yet.
00:39:14 Thomas Kluyver: EuXFEL also isn’t using SWMR – there are other mechanisms to get streaming data during an experiment. Though in some cases people do read sequence files (equivalent to the ‘chunks’ Wout showed) once those are closed.
00:39:40 L S: Can we use hdf5plugin for other languages, e.g., C?
00:40:04 Graeme Winter: Uh, we read while writing via SWMR with Eiger @ DLS – talking about this in about an hour
00:40:35 Graeme Winter: Very much treats the data file as a stream, though with fairly graceful random access
00:40:48 Nicholas Devenish: I don’t think any of the readers do it over NFS though
00:40:57 Wout De Nolf: @Graeme looking forward to it
00:41:04 Graeme Winter: 👍
00:41:41 Peter: Is there any integration with https://anaconda.org/conda-forge/hdf5-external-filter-plugins ?
00:42:33 Ulrik Pedersen: Re: Can we use hdf5plugin for other languages, e.g., C? We have rolled a collection of filters very similar to ‘hdf5plugin’ but not specific for python. Might be useful to some of you: https://github.com/dls-controls/hdf5filters
00:43:55 Mike Smith: Simlarly, here’s an effort to provide a similar set of filters in R https://github.com/grimbough/rhdf5filters
00:44:01 Peter: Need cross-platform for main three OS (x86_64 and probably ARM64 in future).
00:44:51 Markus Gerstel: hdf5-external-filter-plugins is essentially dead
00:45:00 Peter: Also need compression library bundled in each plugin (OSGi limitation).
00:46:45 L S: Thanks for the link @Ulrik and @Mike. It seems that there is a lot of effort being made towards HDF5 plugins management, which some of this effort across facilities are overlapped. Is the THG thinking to fedrate this effort?
00:46:49 L S: *fedrate
00:46:56 L S: *federate
00:47:56 Elena Pourmal: https://github.com/HDFGroup/hdf5_plugins
00:48:05 Elena Pourmal: to add to the mix 🙂
00:48:31 Elena Pourmal: Binaries for plugins are provided with each HDF5 release
00:48:32 Ulrik Pedersen: Nice, I wasn’t aware 🙂
00:48:42 L S: Plus: https://github.com/ccr/ccr, https://github.com/nexusformat/HDF5-External-Filter-Plugins and https://confluence.desy.de/display/FSEC/HDF5+-+External+filter+plugin%2C+installation+on+Windows+10
00:49:24 L S: More to the mix 🙂
00:50:07 Zdenek Matej (MAX IV): (MAX IV) I would mention we have issues with plugin distributions for “non-python” users, in particular e.g. bitshuffle and Matlab Windows and MacOs users
00:50:40 Nicholas Devenish: We’ve found that getting the plugin paths set/correct and passing that through to HDF5 has been the hard part of plugin management, which hdf5plugin solved for us. Especially because some of the plugin management API seemed not to work for us (e.g. h5pl.append didn’t work, perhaps because via h5py)
00:51:31 Thomas Kluyver: EuXFEL: we’ve not gone any further than gzip compression, because of compatibility concerns about different programs trying to read it.
00:52:12 Mike Smith: In R I set the “HDF5_PLUGIN_PATH” environment when a user loads the rhdf5filters pacakage, so it should be transparent to an R user.
00:52:40 Zdenek Matej (MAX IV): (question) So what is the most popular compression at ESRF and how much data are you compressing now ?
00:53:29 Lori Cooper: I will post the chat transcript on the forum later.
00:53:49 Jerome Kieffer: ESRF mainly uses bitshuffle-LZ4 (as dectris does) and JPEG200 for tomography
00:55:35 L S: Do you support 32-bit platforms (which can be interesting for, e.g., IoT)?
00:55:42 Jerome Kieffer: Yes we do
00:56:06 L S: Are you thinking to provide binaries (liek the THG does)?
00:56:42 L S: Thanks!
00:57:12 Markus Gerstel: hdf5plugin 3.0 has a MacOS ARM build on conda-forge
01:02:40 Thomas VINCENT: @Peter > Is there any integration with https://anaconda.org/conda-forge/hdf5-external-filter-plugins ?
01:03:25 Thomas VINCENT: -> We provide blosc, LZ4, bitshuffle but not BZIP2 (yet)
01:04:37 Peter: @Thomas, it may be best to separate the filter plugins packaging from the python package so can be used in other languages
01:05:56 Thomas VINCENT: hdf5plugin purpose is to package compression filters for Python. The use outside of python is a side product to me, and I would more look at HDFGroup provided plugins outside of Python.
01:06:18 Elena Pourmal: Very nice interface (compare with C !)
01:07:39 Thomas VINCENT: @L S > Is the THG thinking to federate this effort? => To me there is a risk of incompatible plugins because the code is copied around. In hdf5plugin I use the source referenced from the HDFGroup documentation and do not modify it.
01:09:49 Ezequiel Cimadevilla: Does vds work with remote vfd like s3? And with vol?
01:11:58 Elena Pourmal: VDS was not tested with S3 VFD
01:12:24 HUDER Loïc: hdf5-vds-check looks very interesting. Getting empty data instead of errors can indeed lead to frustrating debugging 😀
01:12:40 Elena Pourmal: I doubt it will work since source files will be referenced by their POSIX filenames
01:13:59 Peter: Is there any performance problems using VDS?
01:14:32 HUDER Loïc: Do you plan to change the “empty data” behaviour for VDS or is it too difficult to work around ?
01:15:19 Peter: I mean read performance.
01:16:30 Graeme Winter: I’ve explored some of this as “hobby work” – there is sometimes a small overhead but depends a lot on the shapes of the data
01:16:55 Graeme Winter: Certainly for non-trivial calculations on e.g. Eiger data the read performance hit is not significant
01:17:47 Nicholas Devenish: My recollection with working with EuXFEL data is that remapping the missing module data was pretty expensive. I think there were some inefficiencies in their reader library that might be fixed now.
01:21:32 Thomas Kluyver: (I also maintain the EuXFEL file-reading library, EXtra-data – so if it has any issues, do let me know 🙂
01:24:27 Nicholas Devenish: It’s was very pleasant to find that there’s a reasonable library to read your data rather than having to build from scratch ourselves
01:27:15 Thomas Kluyver: Thank you! Particularly as a run can contain hundreds or thousands of files, we felt it was important to provide a layer which stitched the files together as one object.
01:32:18 Samuel Debionne: Is the static introspection emulation tightly coupled with the library or can I plug in mine (say Boost.Describe)?
01:42:26 Nicholas Devenish: This looks very interesting and will certainly be having a look at the repository!
01:42:31 Raphaël GIRARDOT: Is this an open solution ?
01:43:14 Nicholas Devenish: No licence file though 😛
01:43:44 Nicholas Devenish: setup.py says MIT so
01:57:18 Vijay Kartik: I think there is a common theme of HDF5 write speed for compressed data being lower than desired when the data is N-dimensional, for large N
02:07:41 Zdenek Matej (MAX IV): Are you using direct chunk write? My colleagues are quite happy with that. In case of parallel-HDf5 there is no such option as that is not a collective function and data will be corrupted. But they like to compress in-memory (on cpu :-)) and direct write chunks with serial-hdf5.
02:11:15 Ulrik Pedersen (Diamond): Yes, Odin DAQ uses direct chunk write. The data is pre-compressed by the Eiger system
02:11:41 Zdenek Matej (MAX IV): clear, it comes with an additional advantage
02:12:32 Thomas VINCENT: a
02:13:06 Wout De Nolf: How do you synchronize readers with the writer?
02:13:57 Nicholas Devenish: The data reaches it’s final “rest state” on disk as fast as possible and is used in that form for online analysis
02:41:10 L S: After indexing of H5 files was mentioned, it came to mind of what is the status of H5Q and H5X APIs status?
02:42:06 Elena Pourmal: Those are still in prototype since we didn’t have much response from community on their usability
02:42:16 L S: Thanks!
02:42:52 L S: Concerning the indexing of the data itself, can we start using MIQS for that (i.e. is it stable enough)?
02:44:41 Suren Byna: @L S: MIQS is for indexing metadata; it has been released and used by a few, but haven’t heard any complaints about it so far.
02:45:00 L S: Ok, thanks!
02:45:41 L S: So, is there a possible overlap between H5Q/H5X and MIQS?
02:47:27 Suren Byna: Yes, MIQS can be integrated with H5Q/H5X for querying and indexing metadata
02:48:19 L S: You call H5Q/H5X APIs and, under-the-hood, it call MIQS?
02:49:15 Suren Byna: The integration of MIQS is not done with the H5Q/H5X APIs yet, but its in the plans
02:49:20 Suren Byna: Sorry, have to drop off to join another meeting, but if there are any questions about MIQS, please send them to me at sbyna@lbl.gov
02:49:47 L S: Thanks @Suren!
02:50:49 Lucas Villa Real: @Danny: your presentation has the best title ever
02:53:56 Andy Gotz: @Lana do you consider an approach like H5coro which looks like it could scale to thousands of readers
03:06:18 Lucas Villa Real: @Danny: Do you face any challenges on Jupyterhub when it comes to handling large number of data processing requests from users of that platform?
03:07:52 baptiste demoulin: (
03:07:54 Danny Price: Hi @lucas, we haven’t had any issues yet, but to date we have been manually ‘load balancing’ and distributing users across our GPU nodes
03:13:48 L S: Is Mathworks thinking to share code/knowledge or collaborate with the THG toenable the HDF5 library to write HDF5 data stored in S3 or Azure (so that the entire HDF5 community could benefit from it)?
03:14:05 Lucas Villa Real: @Danny interesting, thanks for sharing the details. Perhaps https://github.com/jupyterhub/batchspawner might be useful to you
03:19:38 Herbert J Bernstein: Is it OK to use 1.12 in 1.10 compatibility mode with matlab or do we have to use a real 1.10
03:19:53 Elena Pourmal: Thank you, Ellen! Very nice talk!
03:21:10 Mike Folk: Ellen, can we access your demo of SWMR to demo to others the power of using SWMR?
03:21:25 Mike Folk: Thanks.
03:21:54 Zdenek Matej (MAX IV): Ellen, thanks for the work, we will try ASAP 😉
03:22:16 L S: The question is more if we could enable the HDF5 library to write data stored in S3 and Azure.
03:22:53 L S: Thanks!
03:23:29 Ellen Johnson: thank you all! email me at ellenj@mathworks.com with questions/comments!
03:24:29 Zdenek Matej (MAX IV): @Danny, we are having good experience with JupyterHub based on cassinyio SwarmSpawner: https://gitlab.com/MAXIV-SCISW/JUPYTERHUB/jupyter-docker-swarmspawner, it was more easier than to install e.g. JupyterHub on Kubernetees and it is working with Docker
03:30:58 Danny Price: Thanks Zdenek, will take a look!
03:32:26 Andrea Lorenzon: (Am I the only one with audio issues?)
03:32:34 Benjamin Watts: no, me too
03:33:54 Andy Gotz: the audio is understandable most of the time for me but occasionally gets lost – is this what your are experiencing?
03:34:12 Zdenek Matej (MAX IV): yes
03:34:25 Andrea Lorenzon: yes
03:34:51 Andy Gotz: I think we will have to live with this for the remaining 5’….
03:39:13 Raphaël GIRARDOT: audio is getting random
03:40:27 Jerome Kieffer: Since the runtime is given in second … wouldn’t it make sense to present AWS’s prices per second even if the accounting is performed in ms by AWS ?
03:50:13 Graeme Winter: Does this mean we may need to defrag the files in the future?
03:51:24 Graeme Winter: yup thank you
03:53:14 Andy Gotz: Please post questions for the discusson (the next session) here or online: https://www.hdfgroup.org/hug/europeanhug21/submit-a-question/
03:56:42 Wout De Nolf: Can we get a list of all things a writer in SWMR mode cannot do compared to vanilla writing?
03:57:08 Peter: Will this be transparent to the readers?
03:58:28 Peter: I meant does the reader need to know the file is being written in VFD SWMR mode
03:59:05 Andy Gotz: Got it – we will ask John during the question time
04:00:02 Andy Gotz: @Peter is your question answered by the last line on this slide (8)
04:03:19 Herbert J Bernstein: Bravo, sounds very useful, esp. crash recovery at some future date.
04:03:39 Graeme Winter: Are snapshots cleaned up on file close
04:03:41 Graeme Winter: ?
04:04:15 Graeme Winter: Oh, pls ignore “Snapshots expire after max_lag ticks.”
04:10:43 Peter: Thanks for the answer!
04:11:24 Jonathan Wright: How large is a metadata file, does it typically fit in memory … ?
04:12:09 Zdenek Matej (MAX IV): what happens when writer crashes (file not closed properly), can the file get somehow damaged?
04:14:11 Zdenek Matej (MAX IV): yah, improvement, thanks
04:14:40 Raphaël GIRARDOT: will it be available for Java ?
04:15:20 Rodrigo Castro: From performance point of view (compared with current SWMR) has any significant difference in case of “small” writes / reads?
04:15:24 Jordan Henderson: We can certainly add the Java wrappers if there is interest.
04:17:36 Ellen Johnson: thank you John for the update!
04:23:31 Ellen Johnson: Love that HDF5 now on github!
04:23:42 Thomas Kluyver: Thank you for setting up regular testing with h5py!
04:24:15 Andy Gotz: Yes both these features are very useful and appreciated by the community 😎
04:24:28 Lori Cooper: Register to join the weekly Tuesday HDF Clinics: https://forum.hdfgroup.org/t/call-the-doctor-weekly-hdf-clinic/8112/17
04:26:31 Graeme Winter: Actually threaded access to HDF5 -> 👍
04:27:12 Ellen Johnson: thank you for work on performance!
04:28:35 Thomas Kluyver: That is already possible, I think? Something like set_libver_bounds?
04:28:44 Graeme Winter: This – pretty sure you can?
04:29:24 Graeme Winter: You can make “legacy” format files with new versions of library
04:29:24 Kira Duwe: Yesterday in talk on HDFql at the end in “what’s next”, user-defined functions were mentioned. Are they in any way related to the User-defined functions discussed in the later talk on HDF5-UDF?
04:29:28 Thomas Kluyver: https://support.hdfgroup.org/documentation/hdf5/latest/group___f_a_p_l.html#gacbe1724e7f70cd17ed687417a1d2a910
04:30:49 Thomas Kluyver: (& this is also exposed in h5py: https://docs.h5py.org/en/stable/high/file.html#version-bounding)
04:30:50 Zdenek Matej (MAX IV): a wish list: maybe a tool that can get info what features are available, maybe with some addition info, something like hdf5_info or is there anything like that already? or not enough features to list?
04:34:36 Gerd Heber: The HDFql UDFs are different from the HDF5-UDFs. The former are for data transformations, e.g., aggregate functions etc., the latter generate data behind the scenes by accessing a dataset.
04:35:02 Wout De Nolf: Wish list: while we wait for new VFD SMWR, HDF5_USE_FILE_LOCKING through API instead of environment variable (also configure what happens when EXT and VDS links are followed). This may already exist … ?
04:35:25 Kira Duwe: @Gerd Heber: Thank you! That was what I wanted to know
04:35:37 Gerd Heber: You can access an HDF5-UDF based dataset from HDFql like any other dataset
04:36:49 Gerd Heber: http://docs.hdf5.info/hdf5/develop/group___f_a_p_l.html#title62
04:36:56 Gerd Heber: 1.10.7
04:38:02 Elena Pourmal: https://support.hdfgroup.org/documentation/hdf5/latest/group___f_a_p_l.html#ga503e9ff6121a67cf53f8b67054ed9391
04:38:06 Ellen Johnson: filter plugin question (related to question asked earlier) — any plans to consolidate all the plugins into one “truth”? there are a bunch of github projects include THG now that host plugins and gets confusing
04:38:12 Elena Pourmal: There is and API for file locking
04:38:38 Thomas Kluyver: set_file_locking is not exposed in h5py yet
04:40:55 Ellen Johnson: related to plugins – for linux, do you know if most folks are building them via configure/make/make install, or CMake?
04:41:14 Elena Pourmal: CMake for us
04:41:42 Ellen Johnson: ok because affects how we will document building the plugins
04:42:26 Ellen Johnson: yes i have a love/hate relationship with cmake
04:42:44 Samuel Debionne: Any news on “Enabling Multithreading Concurrency in HDF5” since last November webinar?
04:43:50 Kira Duwe: Is there a somewhat complete list of all VOL plugins or a central point to start gathering them?
04:46:33 Lori Cooper: Submitted on the website: Is there an official policy/strategy when it comes to the management of plugins available for HDF5? Is there an intention from the THG to federate (sometimes overlapped) efforts to maintain/distribute HDF5 plugins? For instance, and as far as I can see, it seems that the following projects could be somehow merged so that the entire HDF5 community could benefit from it:

https://github.com/silx-kit/hdf5plugin

https://github.com/ccr/ccr

https://github.com/nexusformat/HDF5-External-Filter-Plugins

https://confluence.desy.de/display/FSEC/HDF5+-+External+filter+plugin%2C+installation+on+Windows+10

Finally, would it be possible to compile HDF5 plugins and ship the binaries in each official release of the HDF5 library (like you are currently doing – https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.7/plugins/) for 32 bit platforms as well?
04:47:00 Elena Pourmal: THG supports registered VOL connectors https://support.hdfgroup.org/documentation/hdf5-docs/registered_vol_connectors.html
04:47:17 Samuel Debionne: Looking forward the RFC, thanks!
04:51:48 Zdenek Matej (MAX IV): (+) for direct chunk write/read for parallel-hdf5
04:53:03 Jonathan Wright: Note to self, HDFGroup plugins sources are here : https://github.com/HDFGroup/hdf5_plugins
04:53:46 Ellen Johnson: easy question: when we have in person meetings again, can we get tour of a synchrotron facility in Europe?
04:54:45 Ellen Johnson: thanks!
04:55:04 Ellen Johnson: Thank you to all at THG and attendees!
04:55:04 Kira Duwe: thank you very much!

Scroll to Top