01Car Incident
a car drove recklessly through an open field flipping over the car was severely damaged and a group of guys started playing beer pong
A car flipped while driving recklessly through a field, then some guys started playing beer pong.
A side-by-side comparison of concatenated cause-effect prompts and LLM-fused phrasings, and how each shapes the temporal and causal fidelity of generated video.
A video creation platform that maintains strict adherence to input text structure.
a car drove recklessly through an open field flipping over the car was severely damaged and a group of guys started playing beer pong
A car flipped while driving recklessly through a field, then some guys started playing beer pong.
a player performs a fatality move in mortal kombat another character is killed in the game
A player performs a fatality move in Mortal Kombat, killing another character.
a man is folding a piece of paper a paper airplane is being created
A man folds a piece of paper into a paper airplane.
a soccer player kicked the ball with precision the ball successfully went into the goal
A soccer player kicks the ball precisely into the goal.
a boy decided to perform on stage the audience watched and listened to his singing
A boy performs on stage, singing to the audience.
| Aspect | Concatenated Approach | LLM Fusion Approach |
|---|---|---|
| Temporal Ordering | Explicit and clear | Less distinct transitions |
| Causal Relationship | Strongly preserved | Partially weakened |
| Video Generation | More accurate scene transitions | Merged scenes with less distinction |
| Narrative Structure | Clear separation of events | Smoother but less structured |
The concatenated approach consistently leads to more accurate representation of temporal sequence and causal relationships across all examples. This further validates our choice of maintaining explicit temporal-causal structure in CTN captions.