Q. In which scenario would you use unsupervised learning for embeddings?
-
A.
When labeled data is available
-
B.
When you want to classify text
-
C.
When you want to discover patterns in unlabeled text
-
D.
When you need to evaluate model performance
Solution
Unsupervised learning is used to discover patterns in unlabeled text, such as clustering or generating embeddings.
Correct Answer:
C
— When you want to discover patterns in unlabeled text
Learn More →
Q. What does the term 'subword tokenization' refer to?
-
A.
Breaking words into smaller meaningful units
-
B.
Combining multiple words into a single token
-
C.
Ignoring punctuation in tokenization
-
D.
Using only the first letter of each word
Solution
Subword tokenization refers to breaking words into smaller meaningful units, which helps in handling out-of-vocabulary words.
Correct Answer:
A
— Breaking words into smaller meaningful units
Learn More →
Q. What is the main advantage of using pre-trained embeddings?
-
A.
They require no training
-
B.
They are always more accurate
-
C.
They save computational resources and time
-
D.
They can only be used for specific tasks
Solution
Pre-trained embeddings save computational resources and time as they leverage knowledge from large datasets.
Correct Answer:
C
— They save computational resources and time
Learn More →
Q. What is the main purpose of using embeddings in NLP?
-
A.
To reduce the dimensionality of text data
-
B.
To convert text into a format suitable for machine learning
-
C.
To capture semantic meaning of words
-
D.
To improve the speed of tokenization
Solution
Embeddings are used to capture the semantic meaning of words in a continuous vector space.
Correct Answer:
C
— To capture semantic meaning of words
Learn More →
Q. What is the output of a tokenization process?
-
A.
A list of sentences
-
B.
A list of tokens
-
C.
A numerical vector
-
D.
A confusion matrix
Solution
The output of a tokenization process is a list of tokens derived from the input text.
Correct Answer:
B
— A list of tokens
Learn More →
Q. What is the purpose of using subword tokenization?
-
A.
To handle out-of-vocabulary words
-
B.
To increase the size of the vocabulary
-
C.
To improve model training speed
-
D.
To reduce the number of tokens
Solution
Subword tokenization helps handle out-of-vocabulary words by breaking them into smaller, known subword units.
Correct Answer:
A
— To handle out-of-vocabulary words
Learn More →
Q. What is the purpose of using the 'padding' technique in NLP?
-
A.
To remove unnecessary tokens
-
B.
To ensure all input sequences are of the same length
-
C.
To increase the vocabulary size
-
D.
To improve the accuracy of embeddings
Solution
Padding is used to ensure all input sequences are of the same length, which is necessary for batch processing in models.
Correct Answer:
B
— To ensure all input sequences are of the same length
Learn More →
Q. What is tokenization in Natural Language Processing (NLP)?
-
A.
The process of converting text into numerical data
-
B.
The process of splitting text into individual words or phrases
-
C.
The process of training a model on labeled data
-
D.
The process of evaluating model performance
Solution
Tokenization is the process of splitting text into individual words or phrases, which are called tokens.
Correct Answer:
B
— The process of splitting text into individual words or phrases
Learn More →
Q. Which evaluation metric is commonly used for NLP tasks involving classification?
-
A.
Mean Squared Error
-
B.
F1 Score
-
C.
Silhouette Score
-
D.
Log Loss
Solution
F1 Score is commonly used for evaluating classification tasks in NLP, balancing precision and recall.
Correct Answer:
B
— F1 Score
Learn More →
Q. Which evaluation metric is commonly used to assess the quality of embeddings?
-
A.
Accuracy
-
B.
F1 Score
-
C.
Cosine Similarity
-
D.
Mean Squared Error
Solution
Cosine similarity is commonly used to assess the quality of embeddings by measuring the angle between two vectors.
Correct Answer:
C
— Cosine Similarity
Learn More →
Q. Which of the following is a common method for word embeddings?
-
A.
TF-IDF
-
B.
Bag of Words
-
C.
Word2Vec
-
D.
Count Vectorization
Solution
Word2Vec is a popular method for generating word embeddings that captures semantic relationships between words.
Correct Answer:
C
— Word2Vec
Learn More →
Q. Which of the following is NOT a type of tokenization?
-
A.
Word tokenization
-
B.
Sentence tokenization
-
C.
Character tokenization
-
D.
Phrase tokenization
Solution
Phrase tokenization is not a standard type of tokenization; the common types are word, sentence, and character tokenization.
Correct Answer:
D
— Phrase tokenization
Learn More →
Q. Which of the following techniques is NOT typically used for tokenization?
-
A.
Whitespace tokenization
-
B.
Subword tokenization
-
C.
Character tokenization
-
D.
Gradient descent
Solution
Gradient descent is an optimization algorithm, not a tokenization technique.
Correct Answer:
D
— Gradient descent
Learn More →
Showing 1 to 13 of 13 (1 Pages)