Building Opacity Tables from DACE#
This page describes how to use the scripts in opac_data/DACE_tools/ to
convert per-molecule opacity binaries downloaded from the
DACE Opacity Database into the
Zarr v3 format used by Exo_Skryer.
Two table types are supported:
Opacity sampling (OS) — full high-resolution cross-section cube, used with
physics.opac_line: os.Correlated-k (c-k) — k-coefficient cube with Gauss-Legendre quadrature, used with
physics.opac_line: ck.
Both formats are produced by the same two scripts and registered in the YAML config identically.
Overview#
DACE Binary Files (.bin, Δν = 0.01 cm⁻¹)
│
│ input_os.txt / input_ck.txt
▼
Gen_OS_table_R_zarr.py → <species>_R<R>.zarr + <species>_R<R>.zarr.zip
Gen_ck_table_R_zarr.py → <species>_ck_R<R>.zarr + <species>_ck_R<R>.zarr.zip
│
│ wl_R<R>.txt / wl_ck_R<R>.txt (shared wavelength grid)
▼
opac_data/os/ or opac_data/ck/
│
▼
retrieval_config.yaml → opac.line entries
Prerequisites#
Download DACE binaries. Visit https://dace.unige.ch/opacityDatabase/ and download the per-molecule
.binarchive for each species you need. Unpack into a directory (e.g.opac_data/opacities/H2O_EXOMOL/). Each.binfile represents one (T, P) point on a fixed wavenumber grid (Δν = 0.01 cm⁻¹).Install dependencies. The scripts require
numpy,zarr(v3), andnumcodecs:pip install numpy "zarr>=3" numcodecs
Locate the tools. All scripts and input templates live in:
opac_data/DACE_tools/ ├── Gen_OS_table_R_zarr.py ├── Gen_ck_table_R_zarr.py ├── input_os.txt ← template for OS tables └── input_ck.txt ← template for c-k tables
DACE Input Data Format#
The DACE database distributes pre-computed line-by-line cross-sections as raw
IEEE-754 single-precision binary files (.bin), one file per (T, P) point.
File naming:
Most species:
Out_<wn_start>_<wn_end>_<T>_<P>.binFe / FeII:
<T>_<P>.bin
Fixed pressure grid (34 levels, log₁₀ bar):
Range |
Values (log₁₀ bar) |
|---|---|
Low pressure |
−8.00 → −0.33 |
High pressure |
0.00 → +3.00 |
The temperature list is not fixed — it is inferred automatically by globbing the binary directory.
Units: cm² molecule⁻¹ for most species; cm² g⁻¹ for Fe and FeII. The scripts convert to cm² molecule⁻¹ using σ = σ_raw × mol_weight / N_A where needed.
Input File Format#
Both scripts read the same plain-text input format. Lines beginning with
# are comments. After the 14-line header, the scripts process every
uncommented species line until the first blank line after the active block.
# wl start (micron)
0.3
# wl end (micron)
30.0
# resolution (R = λ/Δλ)
250
# wavelength output file
wl_ck_R250.txt
# output base name template (.zarr and .zarr.zip will be produced)
H2O_ck_R250.txt
# input form (currently unused, set to 1)
1
# species mol_weight T_min T_max wn_start wn_end binary_dir
H2O 18.01528 50 6100 0 42000 ../opacities/H2O_EXOMOL
CO 28.0101 50 6100 0 23000 ../opacities/CO_EXOMOL
Field descriptions:
wl_start,wl_endWavelength range in microns for the output table.
RSpectral resolving power (R = λ/Δλ). Typical values:
OS tables: 10 000 – 20 000
c-k tables: 250 – 1000
wl_fileName of the wavelength-grid output file (a plain-text column of wavelengths in microns). This file is shared across all species generated at the same R and must be referenced as
opac.wl_masterin the YAML config.out_nameOutput-name template. If exactly one species is active, the scripts use this name directly. If multiple species are active, they auto-generate per-species names such as
H2O_R20000.zarrorH2O_ck_R250.zarr.speciesMolecule identifier string (informational; stored as a Zarr attribute).
mol_weightMolecular weight in g mol⁻¹.
T_min,T_maxInformational temperature bounds; the actual temperature list is discovered from the binary files present in
binary_dir.wn_start,wn_endWavenumber range (cm⁻¹) used to select which
.binfiles to open. Setwn_start = 0andwn_endto cover the full molecule range.binary_dirPath (relative to the script, or absolute) to the directory of DACE
.binfiles for this species.
To process multiple species in one run, add one active species line per
molecule in a contiguous block. Comment out lines with # to skip them, and
leave a blank line after the active block to stop processing.
Generating Opacity Sampling (OS) Tables#
Run the OS script from within opac_data/DACE_tools/:
python Gen_OS_table_R_zarr.py
or explicitly:
python Gen_OS_table_R_zarr.py --input input_os.txt
What the script does:
Builds a constant-R wavenumber grid from
wl_starttowl_end(Δν/ν = 1/R at each centre point).For every (T, P) pair, reads the corresponding DACE binary and converts cross-sections to cm² molecule⁻¹.
Interpolates the native 0.01 cm⁻¹ data onto the output grid using
np.interpin log-space, flooring values at 1×10⁻⁹⁹ before taking log₁₀.Writes the (nT × nP × nλ) log₁₀ cross-section cube to Zarr.
Output Zarr schema:
Key |
Shape |
dtype |
Description |
|---|---|---|---|
|
(nT,) |
float64 |
Temperature grid (K) |
|
(nP,) |
float64 |
Pressure grid (bar) |
|
(nλ,) |
float64 |
Wavelength grid (µm), ascending |
|
(nT, nP, nλ) |
float32 |
log₁₀ cross-section (cm² molecule⁻¹) |
attr |
— |
str |
Species name |
Note
Exo_Skryer reads the linear temperature and pressure arrays from the
Zarr store and computes log10(T) and log10(P) internally for
interpolation. The files do not need to provide separate log-axis arrays.
Output Files#
Both scripts produce two output files from the same base name:
File |
Description |
|---|---|
|
Zarr v3 directory store — fastest for local random access |
|
Zarr v3 zip store — single portable file for archiving / sharing |
Compression: Blosc/lz4, level 1, byte-shuffle — optimised for read speed over file size.
Both formats are read transparently by registry_ck.py and
registry_line.py. Always use .zarr.zip in YAML configs — it is a
single portable file that works everywhere. If a .zarr directory path is
given and the directory is absent, the registry will fall back to the
.zarr.zip file in the same location.
Move the output files into the appropriate subdirectory. For multi-species runs, move each generated species table plus the shared wavelength file:
mv H2O_R20000.zarr opac_data/os/
mv H2O_R20000.zarr.zip opac_data/os/
mv wl_R20000.txt opac_data/os/
mv H2O_ck_R250.zarr opac_data/ck/
mv H2O_ck_R250.zarr.zip opac_data/ck/
mv wl_ck_R250.txt opac_data/ck/
Registering Tables in the YAML Config#
Reference the new files under opac in your retrieval_config.yaml
(or forward_config.yaml):
Opacity sampling example:
physics:
opac_line: os
opac_ray: os
opac_cia: os
opac:
wl_master: ../../opac_data/os/wl_R20000.txt
full_grid: false
ck: false
ck_mix: RORR
line:
- {species: H2O, path: ../../opac_data/os/H2O_R20000.zarr.zip}
- {species: CO, path: ../../opac_data/os/CO_R20000.zarr.zip}
Correlated-k example:
physics:
opac_line: ck
opac_ray: ck
opac_cia: ck
opac:
wl_master: ../../opac_data/ck/wl_ck_R250.txt
full_grid: false
ck: true
ck_mix: TRANS # or RORR
line:
- {species: H2O, path: ../../opac_data/ck/H2O_ck_R250.zarr.zip}
- {species: CO, path: ../../opac_data/ck/CO_ck_R250.zarr.zip}
Note
The wl_master file must be the one produced by the same script run
that generated the line opacity tables — they share an identical wavelength
grid. Mixing wavelength files from different R values or different runs
will cause a shape mismatch at runtime.
Supported Species Reference#
The table below lists the species present in the input_ck.txt and
input_os.txt templates, along with their molecular weights and the
recommended wavenumber range. Uncomment the relevant line in the input file
to generate a table for that species.
Species |
Line list |
Mol. wt. |
T min (K) |
T max (K) |
wn range (cm⁻¹) |
Notes |
|---|---|---|---|---|---|---|
H₂O |
EXOMOL |
18.01528 |
50 |
6100 |
0 – 42 000 |
|
CO |
EXOMOL |
28.0101 |
50 |
6100 |
0 – 23 000 |
|
CO₂ |
EXOMOL |
44.0095 |
50 |
2900 |
0 – 20 000 |
|
CO₂ |
HITEMP |
44.0095 |
50 |
2900 |
0 – 18 000 |
Alternative line list |
CH₄ |
EXOMOL |
16.0425 |
50 |
1900 |
0 – 12 000 |
|
CH₄ |
HITEMP |
16.0425 |
50 |
2500 |
0 – 14 000 |
Alternative line list |
NH₃ |
EXOMOL |
17.03052 |
50 |
1900 |
0 – 20 000 |
|
H₂S |
EXOMOL |
34.0809 |
50 |
2900 |
0 – 35 000 |
|
SO₂ |
ExoAmes |
64.0638 |
50 |
1900 |
0 – 8 000 |
|
SO₃ |
EXOMOL |
80.0632 |
50 |
1000 |
0 – 5 000 |
|
SO |
EXOMOL |
48.0644 |
50 |
4900 |
0 – 45 000 |
|
OH |
EXOMOL |
17.00734 |
50 |
6100 |
0 – 50 000 |
|
OH |
HITEMP |
17.00734 |
200 |
6100 |
0 – 43 409 |
Alternative line list |
HCN |
EXOMOL |
27.0253 |
50 |
3900 |
0 – 18 000 |
|
C₂H₂ |
EXOMOL |
26.0373 |
50 |
2900 |
0 – 10 000 |
|
C₂H₄ |
EXOMOL |
28.0532 |
50 |
1500 |
0 – 8 000 |
|
Na |
Kitzmann |
22.98977 |
100 |
4000 |
0 – 36 000 |
|
K |
Kitzmann |
39.09830 |
100 |
4000 |
0 – 36 000 |
|
SiO |
EXOMOL |
44.0849 |
200 |
6100 |
0 – 66 481 |
|
CaH |
EXOMOL |
41.0859 |
50 |
4500 |
0 – 30 000 |
|
TiO |
EXOMOL |
63.8664 |
50 |
4900 |
0 – 30 000 |
|
VO |
EXOMOL |
66.9409 |
200 |
6100 |
0 – 35 000 |
|
FeH |
EXOMOL |
56.8529 |
200 |
6100 |
0 – 16 136 |
|
Fe |
Kitzmann |
55.8450 |
200 |
6100 |
0 – 0 |
Special: cm² g⁻¹ input; different filename scheme |
FeII |
Kitzmann |
55.8450 |
200 |
6100 |
0 – 0 |
Special: cm² g⁻¹ input; different filename scheme |
SH |
EXOMOL |
33.0729 |
200 |
4900 |
0 – 37 556 |
|
PO |
EXOMOL |
46.97316 |
50 |
4500 |
0 – 12 000 |
|
PH₃ |
EXOMOL |
33.99758 |
50 |
2900 |
0 – 10 000 |
|
CS₂ |
HITRAN |
76.1407 |
50 |
2500 |
0 – 6 500 |
|
OCS |
EXOMOL |
60.0751 |
50 |
1900 |
0 – 10 000 |
|
HF |
EXOMOL |
20.00634 |
200 |
4900 |
0 – 32 352 |
|
HCl |
HITRAN |
36.4609 |
50 |
4500 |
0 – 21 000 |
Design Notes#
- Constant-R wavelength grid
The grid is constructed so that Δν/ν = 1/R at each centre point (not constant spacing in wavenumber). This is the standard astronomical convention and means the wavelength file can be shared across species with the same R.
- 16-point Gauss-Legendre quadrature split at g = 0.9
Eight nodes are placed below g = 0.9 (the mostly transparent region, where fewer quadrature points are needed) and eight above (the opaque tail, where the integrand varies rapidly). This split improves flux accuracy compared to a uniform 16-point scheme at the same cost.
- log₁₀ storage
All cross-sections are stored as log₁₀(σ) with a minimum floor of log₁₀(10⁻⁹⁹) = −99. This avoids log(0) issues and reduces dynamic-range demands on float32 storage.
- Zarr v3 format
Blosc/lz4 compression with byte-shuffle is used as it prioritises read throughput over compression ratio, which is important for the forward model’s random-access interpolation pattern.