π Synthetic Data Generator
Abstract
This section covers how to craft prompts to use chatbots as a synthetic data generator.
The availability of labelled data is limited in most cases as the process of data labelling is expensive and time-intensive. By providing well-crafted prompts, users can leverage chatbots to create synthetic labelled data for training or evaluating the models. This approach enables researchers, data scientists, and engineers to generate diverse, customizable datasets to address their specific data needs which considerately speeds up the workflow at a nominal cost.
You can use a chatbot to generate synthetic data in two ways namely
- Generate new instances based on the given description (data generation) and
- Generate new instances based on the existing instances by paraphrasing them (data augmentation).
Let us see how to craft prompts to generate synthetic data in both ways.
β Data Generation
In this approach, based on the detailed description of the instances to be generated, the chatbot generates synthetic data. Now let us see how to craft prompts to generate synthetic movie-related tweets and laptop product reviews.
πΏ Generating Movie Tweets
Suppose you have to train a deep learning model to find sentiment analysis of movie review tweets and you have access to a small training dataset. In such a case, you can use the chatbot to generate additional tweets so that you can train the model using a mix of original and synthetically generated tweets. For this, you have to write a prompt like this
Prompt
You are an expert English linguist with over ten years of experience and well aware of social media platforms
like Twitter. Generate five positive and five negative movie-review tweets.
Constraints: Make sure that the length of tweet is within 250 characters.
Output
Certainly! Here are five positive and five negative movie-review tweets, each within 250 characters:
Positive Movie Reviews:
1. "Just watched #TheIncredibleJourney, and I'm in awe! Heartwarming story, brilliant acting, and stunning visuals.
A must-see for all ages. ππ₯ #MovieMagic"
2. "Blown away by #EternalSunset! A cinematic masterpiece with a powerful story and breathtaking cinematography.
Kudos to the entire cast and crew. π
π¬ #MustWatch"
3. "Saw #LaughOutLoudComedy last night, and I couldn't stop laughing! Hilarious script and top-notch performances.
Perfect for a fun night out. ππΏ #ComedyGold"
4. "#MysticDreams took me on a mesmerizing journey! Visually stunning and emotionally captivating. Don't miss
this enchanting film. β¨π #CinematicMagic"
5. "Just saw #LostOpportunity, and I'm disappointed. Weak plot, unconvincing acting, and a predictable ending.
Not worth your time or money. ππ #MovieReview"
Negative Movie Reviews:
1. "#SuperDisasterMovie is a trainwreck! Cringeworthy dialogues, terrible CGI, and a waste of talent.
Avoid at all costs. ππ₯ #MovieFail"
2. "I had high hopes for #TheBigFlop, but it fell flat. Dull story, wooden performances, and an overall snooze
fest. π΄π€ #MovieReview"
3. "Just endured #HorrorShowdown, and it was a true horror show, alright! ClichΓ©d scares, paper-thin characters,
and a forgettable plot. π»π¬ #HorrorFail"
4. "#LastMinuteRomance is a rom-com disaster! Forced chemistry, cringe-worthy humor, and a predictable storyline.
Skip it. ππ« #RomanticComedy"
5. "Expected more from #SciFiThrillerX, but it left me underwhelmed. Confusing plot, lackluster visuals, and a
disappointing ending. Not recommended. ππ½ #MovieDisappointment"
πΏ Generating Laptop Tweets
Suppose you have to classify the laptop product reviews and have access to a small training dataset. In such a case, you can use the chatbot to generate additional laptop product reviews to increase the overall size of the training set. For this, you have to write a prompt like this
Prompt
You are an expert English linguist and well aware of social media platforms like Amazon. Generate five positive
and five negative laptop reviews.
Constraints: Make sure that the laptop product reviews contain a maximum of 30 words.
Output
Positive Laptop Reviews:
1. "I'm blown away by the laptop's speed and performance. The sleek design and vibrant display make it a true
winner!"
2. "This laptop is a productivity powerhouse. Multitasking is a breeze, and the long battery life keeps me
going all day."
3. "The sound quality on this laptop is exceptional. It's like having a built-in stereo system for my entertainment
needs."
4. "I love the lightweight design. It's so portable and fits perfectly in my backpack for work and travel."
5. "The customer service and warranty support for this laptop are outstanding. I had a minor issue, and they
resolved it quickly."
Negative Laptop Reviews:
1. "This laptop is frustratingly slow, and the constant lag makes it impossible to get any work done efficiently."
2. "The battery life is terrible; it barely lasts a few hours. I find myself constantly searching for an
outlet."
3. "The keyboard feels cheap and flimsy. Typing on it is uncomfortable, and I make more mistakes than usual."
4. "The laptop's build quality is disappointing. It creaks and flexes when I use it, making me worry about
its durability."
5. "I've had multiple hardware issues with this laptop, and the customer support was unhelpful and unresponsive."
β Data Augmentation
The main problem with generating new instances based on the given description is that the synthetically generated instances may or may not follow the distribution of existing instances. To avoid this, you can generate new instances based on data augmentation i.e., generating synthetic instances by paraphrasing the existing instances.
πΏ Paraphrasing Movie Review Tweets
Prompt
You are an expert English linguist and well aware of social media platforms like Twitter. Generate three different
paraphrases of the given movie review tweet
Tweet : "Just watched #TheIncredibleJourney, and I'm in awe! Heartwarming story, brilliant acting, and stunning
visuals. A must-see for all ages. ππ₯ #MovieMagic"
Constraints: Make sure that the generated paraphrases have lengths similar to that of the original tweet.
Output
1. "Just finished #TheIncredibleJourney, and I'm absolutely amazed! The touching narrative, exceptional
performances, and breathtaking visuals left me spellbound. A cinematic gem for audiences of all generations. ππ₯ #CinematicWonder"
2. "Recently viewed #TheIncredibleJourney, and I'm filled with admiration! The heartwarming plot, outstanding
acting, and gorgeous cinematography were truly captivating. An essential watch for everyone. ππ₯ #FilmEnchantment"
3. "Just witnessed #TheIncredibleJourney, and I'm truly impressed! The heart-touching story, impeccable
acting, and visually stunning scenes left a lasting impact. A must-watch for viewers of all ages. ππ₯ #MovieMarvel"
πΏ Paraphrasing Laptop product reviews
Prompt
You are an expert English linguist and well aware of platforms like Amazon. Generate five different paraphrases
of the given laptop product review
Product Review: "This laptop is a productivity powerhouse. Multitasking is a breeze, and the long battery life
keeps me going all day."
Constraints: Make sure that the generated paraphrases have lengths similar to that of the original product review.
Output
1. "This laptop excels in boosting productivity with its seamless multitasking capabilities, and its extended
battery life ensures I can stay productive throughout the day."
2. "A productivity beast, this laptop effortlessly handles multitasking, and its extended battery life ensures
uninterrupted performance all day long."
3. "With its robust multitasking performance and long-lasting battery, this laptop is a productivity
champ that keeps me powered up all day."
4. "This laptop is a true productivity powerhouse, making multitasking a breeze, and its impressive battery life
ensures I can work tirelessly from dawn to dusk."
5. "Effortlessly handling multitasking, this laptop is a productivity marvel, thanks to its long-lasting
battery that keeps me going strong throughout the day."