Functions and methods should return data structures, not rendered HTML. Returning HTML encourages early escaping, HTML parsing, and mixed data processing and output. These are all security and performance issues.
What Could Go Wrong?
Assembling and handling strings of HTML complicates things as you’re no longer dealing directly with the data, but a particular form of it. This opens up a can of worms:
- HTML generated and returned by a method has to be early escaped, making it unclear if a variable is safe to output or not. Changes at a later point can introduce unsafe output.
- Returning HTML can also lead to problems of cache invalidation as templates change, cached HTML needs to be flushed, despite the data being shown not changing.
- This also requires that strings of HTML be joined together, leading to complexity.
- Retrieving data and figuring out what to display is slow. If markup is returned and cached, this work needs doing every time that data is being displayed differently.
Code that builds forms and complex layouts suffer badly from this, leading to performance and security problems
The Solution
Instead of returning HTML, return objects and arrays. Figuring out what to display is not the same as figuring out how to display it. Data structures are easier to cache, and can be reused and displayed in different ways.
Instead of:
echo get_form( 'test-form' );
Consider this:
$fields = get_form_data( 'test-form' ); display_form( $fields );
Here, get_form_data
is doing the hard work, and can be wrapped with calls to wp_cache_get
and wp_cache_set
for speed. We’ve also improved get_form
by separating out the data processing into another method, reducing complexity, and allowing us to write simple unit tests without parsing HTML.
A bonus to all of this, is that generating HTML and outputting it is fast! Retrieving stored data and handing it to a display function takes a trivial amount of time in comparison to a trip to the database, or a file read.
REST APIs and Javascript
The same is true of APIs accessed via the browser, scripts should request data, not HTML. Templates and markups should be included with the javascript and fleshed out at runtime. This is one of the strengths of the REST APIs and json structures, they make no assumptions about how your data will be displayed.
Returning HTML in AJAX requests leads to new problems, such as how to execute inline script tags, attaching event listeners, and hooking up logic. Returning data allows you to set up a placeholder item that can be filled in later, speeding up the user interface
So Data not HTML?
Data is expensive, but HTML isn’t. Cache your data, not your HTML. This is how REST APIs work, and it gives you the most flexibility, be it in javascript on the frontend, or server-side
Wow, never actually gave a thought to this, i.e. returning data and html can have totally different implications. Thanks Tom, appreciate your helpful advice. 🙂 🙂
What do you think about escaping your data and returning controlled HTML processed by a template parser?
This would be a final step before output, and the template parser would need to handle escaping as you can’t know in advance how your data is going to be used. E.g. you might use wp_kses_post to sanitise a description, but this won’t work in a tag attribute.
Even then, retrieving the information in the first place is the expensive part. Running it through a template parser to generate HTML shouldn’t be expensive. You’d also have to immediately output the template parsers return data assuming it gives trusted HTML else the variable could be modified elsewhere and is no longer trustable
Filters would occur after data has been retrieved and before template parses the content for final output.
Currently, I’ve been escaping within the data functions themselves. Most (if not all) of my data has already been escaped and is ready for output.
Then, with what ever information I want to allow manipulation, I escape returned filtered content before passed to template parser for final output.
I just like to be sure that the data has been escaped before it’s used. So if the data has been filtered it could possibly have been escaped twice before output. (The original and user manipulated).
I also try to only allow data to be manipulated, and not the HTML itself. Thoughts?
I’d be careful to avoid double escaping, and to do it once as close to the output as possible. The templating engine should be able to handle this, but I’ve never been a huge fan of templating engines inside PHP ( and they encourage users writing their own templates in the UI and all the problems that follow )