Building Opacity Tables from DACE#

This page describes how to use the scripts in opac_data/DACE_tools/ to convert per-molecule opacity binaries downloaded from the DACE Opacity Database into the Zarr v3 format used by Exo_Skryer.

Two table types are supported:

Opacity sampling (OS) — full high-resolution cross-section cube, used with physics.opac_line: os.
Correlated-k (c-k) — k-coefficient cube with Gauss-Legendre quadrature, used with physics.opac_line: ck.

Both formats are produced by the same two scripts and registered in the YAML config identically.

Overview #

DACE Binary Files  (.bin, Δν = 0.01 cm⁻¹)
        │
        │  input_os.txt / input_ck.txt
        ▼
Gen_OS_table_R_zarr.py   →  <species>_R<R>.zarr  +  <species>_R<R>.zarr.zip
Gen_ck_table_R_zarr.py   →  <species>_ck_R<R>.zarr  +  <species>_ck_R<R>.zarr.zip
        │
        │  wl_R<R>.txt / wl_ck_R<R>.txt  (shared wavelength grid)
        ▼
opac_data/os/   or   opac_data/ck/
        │
        ▼
retrieval_config.yaml  →  opac.line entries

Prerequisites #

Download DACE binaries. Visit https://dace.unige.ch/opacityDatabase/ and download the per-molecule .bin archive for each species you need. Unpack into a directory (e.g. opac_data/opacities/H2O_EXOMOL/). Each .bin file represents one (T, P) point on a fixed wavenumber grid (Δν = 0.01 cm⁻¹).
Install dependencies. The scripts require numpy, zarr (v3), and numcodecs:
```
pip install numpy "zarr>=3" numcodecs
```

Locate the tools. All scripts and input templates live in:

opac_data/DACE_tools/
├── Gen_OS_table_R_zarr.py
├── Gen_ck_table_R_zarr.py
├── input_os.txt          ← template for OS tables
└── input_ck.txt          ← template for c-k tables

DACE Input Data Format #

The DACE database distributes pre-computed line-by-line cross-sections as raw IEEE-754 single-precision binary files (.bin), one file per (T, P) point.

File naming:

Most species: Out_<wn_start>_<wn_end>_<T>_<P>.bin
Fe / FeII: <T>_<P>.bin

Fixed pressure grid (34 levels, log₁₀ bar):

Range	Values (log₁₀ bar)
Low pressure	−8.00 → −0.33
High pressure	0.00 → +3.00

The temperature list is not fixed — it is inferred automatically by globbing the binary directory.

Units: cm² molecule⁻¹ for most species; cm² g⁻¹ for Fe and FeII. The scripts convert to cm² molecule⁻¹ using σ = σ_raw × mol_weight / N_A where needed.

Both scripts read the same plain-text input format. Lines beginning with # are comments. After the 14-line header, the scripts process every uncommented species line until the first blank line after the active block.

# wl start (micron)
0.3
# wl end (micron)
30.0
# resolution (R = λ/Δλ)
250
# wavelength output file
wl_ck_R250.txt
# output base name template (.zarr and .zarr.zip will be produced)
H2O_ck_R250.txt
# input form (currently unused, set to 1)
1
# species  mol_weight  T_min  T_max  wn_start  wn_end  binary_dir
H2O 18.01528 50 6100 0 42000 ../opacities/H2O_EXOMOL
CO  28.0101  50 6100 0 23000 ../opacities/CO_EXOMOL

Field descriptions:

wl_start, wl_end

Wavelength range in microns for the output table.

R

Spectral resolving power (R = λ/Δλ). Typical values:

OS tables: 10 000 – 20 000
c-k tables: 250 – 1000

wl_file

Name of the wavelength-grid output file (a plain-text column of wavelengths in microns). This file is shared across all species generated at the same R and must be referenced as opac.wl_master in the YAML config.

out_name

Output-name template. If exactly one species is active, the scripts use this name directly. If multiple species are active, they auto-generate per-species names such as H2O_R20000.zarr or H2O_ck_R250.zarr.

species

Molecule identifier string (informational; stored as a Zarr attribute).

mol_weight

Molecular weight in g mol⁻¹.

T_min, T_max

Informational temperature bounds; the actual temperature list is discovered from the binary files present in binary_dir.

wn_start, wn_end

Wavenumber range (cm⁻¹) used to select which .bin files to open. Set wn_start = 0 and wn_end to cover the full molecule range.

binary_dir

Path (relative to the script, or absolute) to the directory of DACE .bin files for this species.

To process multiple species in one run, add one active species line per molecule in a contiguous block. Comment out lines with # to skip them, and leave a blank line after the active block to stop processing.

Generating Opacity Sampling (OS) Tables #

Run the OS script from within opac_data/DACE_tools/:

python Gen_OS_table_R_zarr.py

or explicitly:

python Gen_OS_table_R_zarr.py --input input_os.txt

What the script does:

Builds a constant-R wavenumber grid from wl_start to wl_end (Δν/ν = 1/R at each centre point).
For every (T, P) pair, reads the corresponding DACE binary and converts cross-sections to cm² molecule⁻¹.
Interpolates the native 0.01 cm⁻¹ data onto the output grid using np.interp in log-space, flooring values at 1×10⁻⁹⁹ before taking log₁₀.
Writes the (nT × nP × nλ) log₁₀ cross-section cube to Zarr.

Output Zarr schema:

Key	Shape	dtype	Description
`temperature`	(nT,)	float64	Temperature grid (K)
`pressure`	(nP,)	float64	Pressure grid (bar)
`wavelength`	(nλ,)	float64	Wavelength grid (µm), ascending
`cross_section`	(nT, nP, nλ)	float32	log₁₀ cross-section (cm² molecule⁻¹)
attr `molecule`	—	str	Species name

Note

Exo_Skryer reads the linear temperature and pressure arrays from the Zarr store and computes log10(T) and log10(P) internally for interpolation. The files do not need to provide separate log-axis arrays.

Generating Correlated-k (c-k) Tables #

Run the c-k script from within opac_data/DACE_tools/:

python Gen_ck_table_R_zarr.py

or explicitly:

python Gen_ck_table_R_zarr.py --input input_ck.txt

What the script does:

Builds the same constant-R wavenumber grid and computes midpoint bin edges to robustly assign native high-resolution points to each spectral bin.
Constructs 16-point Gauss-Legendre quadrature nodes and weights, split 8 + 8 at g = 0.9 (higher point density in the optically thick tail of the k-distribution).
For every (T, P) pair, reads the DACE binary and converts to cm² molecule⁻¹.
Within each spectral bin, sorts the cross-sections to form the cumulative distribution function (CDF) g(k), then samples log₁₀(k) at the Gauss-Legendre g-points by linear interpolation.
Writes the (nT × nP × nλ × ng) k-coefficient cube to Zarr.

Output Zarr schema:

Key	Shape	dtype	Description
`temperature`	(nT,)	float64	Temperature grid (K)
`pressure`	(nP,)	float64	Pressure grid (bar)
`wavelength`	(nλ,)	float64	Wavelength bin centres (µm), ascending
`g_points`	(ng,)	float64	Gauss-Legendre quadrature nodes in [0, 1]
`g_weights`	(ng,)	float64	Quadrature weights (normalised, sum to 1)
`cross_section`	(nT, nP, nλ, ng)	float32	log₁₀ k-coefficient (cm² molecule⁻¹)
attr `molecule`	—	str	Species name

Note

As with OS tables, Exo_Skryer loads the linear temperature and pressure axes and derives the log grids internally in the opacity registries.

Output Files #

Both scripts produce two output files from the same base name:

File	Description
`<name>.zarr/`	Zarr v3 directory store — fastest for local random access
`<name>.zarr.zip`	Zarr v3 zip store — single portable file for archiving / sharing

Compression: Blosc/lz4, level 1, byte-shuffle — optimised for read speed over file size.

Both formats are read transparently by registry_ck.py and registry_line.py. Always use .zarr.zip in YAML configs — it is a single portable file that works everywhere. If a .zarr directory path is given and the directory is absent, the registry will fall back to the .zarr.zip file in the same location.

Move the output files into the appropriate subdirectory. For multi-species runs, move each generated species table plus the shared wavelength file:

mv H2O_R20000.zarr     opac_data/os/
mv H2O_R20000.zarr.zip opac_data/os/
mv wl_R20000.txt       opac_data/os/

mv H2O_ck_R250.zarr     opac_data/ck/
mv H2O_ck_R250.zarr.zip opac_data/ck/
mv wl_ck_R250.txt       opac_data/ck/

Registering Tables in the YAML Config #

Reference the new files under opac in your retrieval_config.yaml (or forward_config.yaml):

Opacity sampling example:

physics:
  opac_line: os
  opac_ray: os
  opac_cia: os

opac:
  wl_master: ../../opac_data/os/wl_R20000.txt
  full_grid: false
  ck: false
  ck_mix: RORR

  line:
    - {species: H2O, path: ../../opac_data/os/H2O_R20000.zarr.zip}
    - {species: CO,  path: ../../opac_data/os/CO_R20000.zarr.zip}

Correlated-k example:

physics:
  opac_line: ck
  opac_ray: ck
  opac_cia: ck

opac:
  wl_master: ../../opac_data/ck/wl_ck_R250.txt
  full_grid: false
  ck: true
  ck_mix: TRANS     # or RORR

  line:
    - {species: H2O, path: ../../opac_data/ck/H2O_ck_R250.zarr.zip}
    - {species: CO,  path: ../../opac_data/ck/CO_ck_R250.zarr.zip}

Note

The wl_master file must be the one produced by the same script run that generated the line opacity tables — they share an identical wavelength grid. Mixing wavelength files from different R values or different runs will cause a shape mismatch at runtime.

Supported Species Reference #

The table below lists the species present in the input_ck.txt and input_os.txt templates, along with their molecular weights and the recommended wavenumber range. Uncomment the relevant line in the input file to generate a table for that species.

Species	Line list	Mol. wt.	T min (K)	T max (K)	wn range (cm⁻¹)	Notes
H₂O	EXOMOL	18.01528	50	6100	0 – 42 000
CO	EXOMOL	28.0101	50	6100	0 – 23 000
CO₂	EXOMOL	44.0095	50	2900	0 – 20 000
CO₂	HITEMP	44.0095	50	2900	0 – 18 000	Alternative line list
CH₄	EXOMOL	16.0425	50	1900	0 – 12 000
CH₄	HITEMP	16.0425	50	2500	0 – 14 000	Alternative line list
NH₃	EXOMOL	17.03052	50	1900	0 – 20 000
H₂S	EXOMOL	34.0809	50	2900	0 – 35 000
SO₂	ExoAmes	64.0638	50	1900	0 – 8 000
SO₃	EXOMOL	80.0632	50	1000	0 – 5 000
SO	EXOMOL	48.0644	50	4900	0 – 45 000
OH	EXOMOL	17.00734	50	6100	0 – 50 000
OH	HITEMP	17.00734	200	6100	0 – 43 409	Alternative line list
HCN	EXOMOL	27.0253	50	3900	0 – 18 000
C₂H₂	EXOMOL	26.0373	50	2900	0 – 10 000
C₂H₄	EXOMOL	28.0532	50	1500	0 – 8 000
Na	Kitzmann	22.98977	100	4000	0 – 36 000
K	Kitzmann	39.09830	100	4000	0 – 36 000
SiO	EXOMOL	44.0849	200	6100	0 – 66 481
CaH	EXOMOL	41.0859	50	4500	0 – 30 000
TiO	EXOMOL	63.8664	50	4900	0 – 30 000
VO	EXOMOL	66.9409	200	6100	0 – 35 000
FeH	EXOMOL	56.8529	200	6100	0 – 16 136
Fe	Kitzmann	55.8450	200	6100	0 – 0	Special: cm² g⁻¹ input; different filename scheme
FeII	Kitzmann	55.8450	200	6100	0 – 0	Special: cm² g⁻¹ input; different filename scheme
SH	EXOMOL	33.0729	200	4900	0 – 37 556
PO	EXOMOL	46.97316	50	4500	0 – 12 000
PH₃	EXOMOL	33.99758	50	2900	0 – 10 000
CS₂	HITRAN	76.1407	50	2500	0 – 6 500
OCS	EXOMOL	60.0751	50	1900	0 – 10 000
HF	EXOMOL	20.00634	200	4900	0 – 32 352
HCl	HITRAN	36.4609	50	4500	0 – 21 000

Design Notes #

Constant-R wavelength grid: The grid is constructed so that Δν/ν = 1/R at each centre point (not constant spacing in wavenumber). This is the standard astronomical convention and means the wavelength file can be shared across species with the same R.
16-point Gauss-Legendre quadrature split at g = 0.9: Eight nodes are placed below g = 0.9 (the mostly transparent region, where fewer quadrature points are needed) and eight above (the opaque tail, where the integrand varies rapidly). This split improves flux accuracy compared to a uniform 16-point scheme at the same cost.
log₁₀ storage: All cross-sections are stored as log₁₀(σ) with a minimum floor of log₁₀(10⁻⁹⁹) = −99. This avoids log(0) issues and reduces dynamic-range demands on float32 storage.
Zarr v3 format: Blosc/lz4 compression with byte-shuffle is used as it prioritises read throughput over compression ratio, which is important for the forward model’s random-access interpolation pattern.