Numerous discussions, conversations and arguments went into play on how our software infrastructure should be designed - we tried to keep it simple, relevant and current to the industry standard and believe that we should stay agile and not over-engineer our tech stack. We needed a good starting point without reinventing the wheel on the technology side, while providing the right added value.
Through data and technology, our purpose is to improve our customer experience, optimize our operations and adapt our products — more specifically, yield and monitoring.
5 core layers were focused on for our initial phase - which is common in the IoT space:
- Inputs and outputs: There would be no "Internet Of Things" without any connected devices that capture data from our environment.
- Edge gateway: It's an extension to our inputs and outputs that enables connectivity within and outside the local network. It provides rules and functions within the local environment.
- Streaming and routing: We need some mechanism to capture and transmit all of the data in real-time. Whether it's sensor data or system commands, we are building a distributed system to help us scale efficiently and maintain high performance.
- Data storage: We have to manage all this information in a logical sense and store it in a consistent matter as our source of truth.
- Data consumption: We want to consume that source of truth to either monitor, interact with it or analyze it.
There are other types of layers or possible other definitions in the industry, but we felt that these foundations should be able to tackle and adapt to any challenges that lie ahead.
Inputs and outputs
Devices, actuators and sensors are the pillars of our system. We use numerous IoT-enabled devices to capture information that is typical to controlled environment agriculture like temperature, humidity, PH, etc.. but also imagery. The system is constantly evaluating and weighting new metrics that can improve its performance and bring value to users.
Our goal is to observe every distinct element or variation of a crop during their growing stages (which can vary greatly by crop classification): seed germination, seedling development, growth and harvesting stages. It's a rinse and repeat process from this point on.
The infrastructure will be "self-reliant" in case it loses internet connectivity for any reason as we expect that not all of our product locations will have reliable access to the Internet. Since we want our system to be built pretty much anywhere, we are designing with connectivity resilience in mind. In other words, data can be processed and understood by the edge gateway itself, and take action without external intervention.
To visualize this better, a GreenForges farm will have many Forges; each of them will be connected to an on-site network and edge gateway. If there's connectivity downtime or outage, that dedicated server will maintain the lifecycle of the Forges until a connection can be made.
Streaming and routing
We need a system capable of ingesting, processing, sorting, and serving numerous data points in real-time, ensuring data is accurate and consistent. This stage can greatly affect the business on multiple fronts and, as it scales, can potentially impact the performance of products.
So, we are making sure every component of our streaming system, on-prem and on-cloud, is decoupled and can adapt quickly to any changes. There's also the notion of reaction and decision in near real-time, or what we refer to as the brain or business logic. A lot of our inspiration came from an in-depth post from a Tesla engineer but, instead of electric vehicles, adapted to agriculture.
Data is crucial for the performance of our products and is core to our business. Our system has time-series data that needs to be accurate. If they are not, they will lead to wrong conclusions and decisions which could potentially mean slower growth, or worst, loss of a whole harvest.
All events (e.g. time-series data from our sensors/devices) will have a historical record through the data streaming system near real-time. As our business grows, our data inputs will necessarily rise and impact write-speed to our database. It will become essential that performance stays unchanged.
Each write has to be taken as it is received from the device itself and put into the database. The NoSQL approach seemed like the obvious choice based on our product requirements:
- A very high volume of writes, and increasing.
- Offline and outage - stay away from primary-replica
- Data loss
It's also great for scalability for our team since it relies simply on adding more servers to the database, making them reliable and stable. For time-series data, this is especially valuable as it means that there should be no loss of data in the list of transactions over time.
Data consumption is the epitome of our products. It ties us to our goals and enables users of GreenForges products to make better decisions, quickly.
Once we have a store of time-series data, our goal is to look for trends and insights. We aim to revolve around three usage applications:
- Data analytics (Data analysis, AI): Improve our products and processes with better insights.
- Data value (Processing, APIs): Create the necessary interfaces and services to optimize operations.
- Human value (Applications, metrics): Add user interface and monitoring tools to ease adoption.
All those applications converge into our core information technology objective; increase our crop production and drive down energy costs through data.
The team also foresees numerous challenges as we'll grow, and already came up with countermeasures:
- Device proliferation: There's a multitude of actions that need to be taken to make a device run and start pushing data, from installation to configuration, and ultimately replacement. It's easy when we are talking about one, but what happens when you run multiple GreenForges farms with a large amount of devices? We pushed in further to outage and hardware malfunction. We need a way to monitor and establish a maintenance routine that scales.
- Security: We already have tools and systems in place to counter potential attacks and our team believes sunless vertical farms in general to be particularly vulnerable to attacks. they can cost whole cycles of harvests and inflict a lot of economic damage, yet the industry barely speaks about it. So we're constantly on the lookout for new threats and improve our technology to secure and protect our devices, but also our data.
- Network and connectivity: This is a broad subject, and still requires more testing. In short, we need to ensure reliable and adequate bandwidth at all time. However, we did factor this in our design and made sure to have an edge gateway and data streaming services that preprocess the raw data locally before sending it to the public internet.
GreenForges' approach to technology is very data-driven. Most of our building blocks are already available and integrated based on our requirements. Internally, it's referred to as our data pipeline. As you may know, new tech emerges all the time and our team is always on the lookout for new technologies that could help us:
- Increase our crop production
- Drive down energy cost
A few folks worked hard to make this a success, shoutout to Simon Caron, Roger Godin and Andrew Stride for the edge gateway and embedded systems, Samir Gafsi for prototyping and implementing our cloud infrastructure with the help of Guillaume Girard along with Milos Milojevic and Nicola Maglio for design, use-case vision and user experience.