Technical question on handling large inputs in Python

gregtucker · February 25, 2026, 4:25pm

Curious whether anyone has thoughts on the following question about model design in Python.

Suppose I have an input to a numerical model that could be either a single float or an array corresponding to locations on a grid. Hydraulic roughness (e.g., Mannings n) could be an example.

I would like the model user to have two options for input variables:

(1) read them from an input file, or

(2) create them in a Python script or session and pass them to the model as items in a dictionary.

For #2, handling is straightforward: create either a float or an array of float and include it as an item in your parameter dictionary. However, harder to use type hints because you have two valid types.

For #1, it makes sense to have “small” variables (like a single float) inside a text-based input file (say, yaml format). But “large” variables (like an array of roughness values) would be better off in a separate file.

So, one way to handle #1 would be to give EITHER a float value OR a string that names a separate file containing the desired array values. PRO: simple and convenient; the model code detects whether the value is float or str and acts accordingly. CON: again, harder to use type hints because now you’re expecting any of three different variable types (a float, an array of float, or a string naming a separate input file).

Any thoughts on pros vs. cons of the “clean-ness” of sticking to single types versus the convenience of dynamic typing?

BSchilperoort · February 26, 2026, 8:23am

Hi Greg,

I think it’s fine to have different possible types as input. I would recommend you use Pydantic for input validation and typing, especially as you said you might be working with yaml (which has many terrible edge cases and typing problems).

You can also make sure that within your model everything consists of arrays and use a single-element array to represent the 1D model. That way the variable types are only on the outside interface, but the typing inside your model is simple and consistent.

samharrison7 · February 26, 2026, 12:47pm

I agree, I think your approach is good and having different possible input types is probably a good thing.

@BSchilperoort - Have you ever tried Schema for validating data? I tend to use this but, to be honest, I find it really clunky to use. I was wondering if you had experience using both and could comment. At a quick glance, they look kind of similar (though the Pydantic docs are much better and maybe it’s more broadly used).

BSchilperoort · February 26, 2026, 1:24pm

I have not used Schema. I think Pydantic is a lot more versatile and used more widely (also very useful in web apis). It also integrates deeply with Python’s typing system. Pydantic can validate many config file types out of the box.

There is a small upside of Schema though; it only uses base Python and is only a single 1k line .py file. Pydantic’s core is written in rust nowadays, making it fast but you do need the the 2MB binary wheels (although theses are available for basically any system, including WebAssembly python).

gregtucker · February 26, 2026, 3:58pm

Thanks Bart and Sam, that’s really helpful advice.

Topic		Replies	Views
Basic question around BMI (v2) `set_value` and input/output variables BMI	3	58	September 2, 2025
Reactive programming in the scientific Python community? General tools , python	5	155	January 16, 2025
Wrestling geospatial data into model runs General	2	93	January 15, 2025
ChatGPT knows Landlab! Landlab landlab	4	92	April 18, 2025
BMI Version 3 Roadmap BMI bmi	9	182	August 19, 2025

Technical question on handling large inputs in Python

Related topics