hugging-face-transformers-batch-nlp (1)(Python)

Loading...

Use a Hugging Face Transformers model for batch NLP

This notebook shows how to use a pre-trained 🤗 Transformers model to perform NLP tasks easily on Spark. It also highlights best practices on getting and understanding performance.

This example shows using a pre-trained summarization pipeline directly as a UDF and an MLflow model to summarize Wikipedia articles.

Cluster setup

For this notebook, Databricks recommends a multi-machine, multi-GPU cluster, such as an 8 worker p3.2xlarge cluster on AWS or NC6s_v3 on Azure using Databricks Runtime ML with GPU versions 10.4 or greater.

The recommended cluster configuration in this notebook takes about 10 minutes to run. GPU auto-assignment does not work for single-node clusters, so Databricks recommends performing GPU inference on clusters with separate drivers and workers.

Set up parameters

For this example, you can use the following parameters:

  • output_schema is the database schema to write data to.
  • output_table is an output table. Which you can delete with the last command of this notebook.
  • number_articles is the number of articles to sample and summarize.
output_schema = "tutorials"
output_table = "wikipedia_summaries"
number_articles = 1024

Data loading

This notebook uses a sample of Wikipedia articles as the dataset.

df = spark.read.parquet("/databricks-datasets/wikipedia-datasets/data-001/en_wikipedia/articles-only-parquet").select("title", "text")
display(df)
 
title
text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
History of physics
[[File:Newtons cradle animation book 2.gif|thumb|"If I have seen further, it is only by standing on the shoulders of giants." &ndash;&nbsp;[[Isaac Newton]]&#8201;<ref>Letter to [[Robert Hooke]] (15 February 1676 by Gregorian reckonings with January 1 as New Year's Day). equivalent to 5 February 1675 using the [[Julian calendar]] with March 25 as New Year's Day</ref>]] [[Physics]] (from the [[Ancient Greek]] φύσις ''[[physis]]'' meaning "[[nature]]") is the fundamental branch of [[science]] that...
Hydrofoil
{{about| Hydrofoils|other types of foil|Foil (fluid mechanics)}} {{Use dmy dates|date=July 2012}} [[File:Carl XCH-4.jpg|thumb|The [[United States Navy|U.S. Navy's]] ''XCH-4'', with hydrofoils clearly lifting the hull out of the water]] A '''hydrofoil''' is a lifting surface, or [[foil (fluid mechanics)|foil]], which operates in water. They are similar in appearance and purpose to [[aerofoil]]s used by [[aeroplane]]s. [[Boat]]s using hydrofoil technology are also simply termed hydrofoils. As spee...
Henri Chopin
{{Use dmy dates|date=December 2013}} {{Refimprove|date=January 2011}} {{See also|Chopin (disambiguation)}} '''Henri Chopin''' (18 June 1922 – 3 January 2008) was an avant-garde poet and musician. ==Life== Henri Chopin was born in Paris,18 June 1922, one of three brothers, and the son of an accountant. Both his siblings died during the war. One was shot by a German soldier the day after an armistice was declared in Paris, the other while sabotaging a train (Acquaviva 2008). Chopin was a French ...
Hassium
{{infobox hassium}} '''Hassium''' is a [[chemical element]] with symbol '''Hs''' and [[atomic number]] 108, named after the German state of [[Hesse]]. It is a [[synthetic element]] (an element that can be created in a laboratory but is not found in nature) and [[radioactive]]; the most stable known [[isotope]], <sup>269</sup>Hs, has a [[half-life]] of approximately 9.7&nbsp;seconds, although an unconfirmed [[metastable state]], <sup>277m</sup>Hs, may have a longer half-life of about 11&nbsp;minu...
Hydrus
{{distinguish2|[[Hydra (constellation)]]. For other uses, see [[Hydrus (disambiguation)]]}} {{featured article}} {{Use dmy dates|date=July 2012}} {{Infobox Constellation | name = Hydrus | abbreviation = Hyi | genitive = Hydri | pronounce = {{IPAc-en|ˈ|h|aɪ|d|r|ə|s}}, genitive {{IPAc-en|ˈ|h|aɪ|d|r|aɪ}} | symbolism = the water snake | RA = {{RA|00|06.1}} to {{RA|04|35.1}}<ref name="boundary"/> | dec= −57.85° to −82.06°<ref name="boundary"/> | family = [[Bayer Family|Bayer]] | quadrant = SQ1 | a...
Hercules
{{About|Hercules in classical mythology|the Greek divine hero from which Hercules was adapted|Heracles|other uses|Hercules (disambiguation)}} {{pp-semi-indef|small=yes}}{{Infobox deity | type = Roman | name = Hercules | image = Pieter paul rubens, ercole e i leone nemeo, 02.JPG | image_size = | alt = | birth_place = | death_place = | caption = ''Hercules fighting the Nemean lion''{{br}}by [[Peter Paul Rubens]] | god_of = | abode = | symbol = | consort = [[Juventas]] | parents = [[Jupiter (my...
History of Poland
{{History of Poland}} The '''history of [[Poland]]''' results from the [[Poland in the Early Middle Ages|migrations of Slavs]] who established permanent settlements on the [[geography of Poland|Polish lands]] during the [[Early Middle Ages]]. In 966 AD, Duke [[Mieszko I]] of the [[Piast dynasty]] [[Baptism of Poland|adopted Western Christianity]]; in 1025 Mieszko's son [[Bolesław I Chrobry]] formally established a [[High Middle Ages|medieval kingdom]]. The period of the [[Jagiellonian dynasty]] ...
Hradčany
{{For|other meanings|Hradčany (disambiguation)}} {{See also|Prague Castle}} [[Image:Hradcany2.jpg|thumb|Hradčany from the Petřín Tower]] '''Hradčany''' (common {{IPA-cs|ˈɦrat͡ʃanɪ|-|Cs-Hradcany.ogg}}; {{lang-de|[[:de:Hradschin|''Hradschin'']]}}), the '''Castle District''', is the [[Districts of Prague|district]] of the city of [[Prague]], [[Czech Republic]], surrounding the [[Prague Castle]]. The castle is said to be the biggest castle in the world<ref>{{cite web|url=http://www.prague-wiki.com...
Houston
{{About|the U.S. city}} {{Hatnote|Note that the city is unrelated to [[Houston County, Texas]], which is located in another part of the state.}} :''Houstonian redirects here. For other uses see: [[The Houstonian (disambiguation)]].'' {{Use mdy dates|date=June 2015}} {{Infobox settlement |name = Houston, Texas |official_name = City of Houston |settlement_type = [[City]] |nickname = <!--DO NOT CHANGE! -->Space City (OFFICIAL) <small>[[Nick...
Hard disk drive
{{Redirect|Hard drive}} {{infobox computer hardware | image = Laptop-hard-drive-exposed.jpg|300px | caption = A 2.5-inch SATA hard drive | invent-date = 24 December 1954{{Efn|This is the original filing date of the application which led to US Patent 3,503,060, generally accepted as the definitive [[disk drive]] patent.<ref>Kean, David W., "IBM San Jose, A quarter century of innovation", 1977.</ref>}} | invent-name = [[IBM]] team led by [[Reynold B. Johnson|Rey Johnson]] }} [[File:Hard...
Hermetic Order of the Golden Dawn
{{About|the historical organization of the late 19th century|other meanings|Golden Dawn (disambiguation){{!}}Golden Dawn}} {{Hermeticism}} {{Golden Dawn}} The '''Hermetic Order of the Golden Dawn''' (or, more commonly, '''The Golden Dawn''') was an organization devoted to the study and practice of the [[occult]], [[metaphysics]], and [[paranormal]] activities during the late 19th and early 20th centuries. Known as a [[Magical organization|magical order]], the Hermetic Order of the Golden Dawn wa...
High jump
{{other uses}} {{Use mdy dates|date=January 2013}} {{Infobox athletics event |event= High jump |image= [[File:Yelena Slesarenko failing 2007.jpg|240px]] |caption= [[Yelena Slesarenko]] using the [[Fosbury Flop]] technique at [[2004 Summer Olympics]]. |WRmen= [[Javier Sotomayor]] {{T&Fcalc|2.45}} (1993) |ORmen= [[Charles Austin]] {{T&Fcalc|2.39}} (1996) |WRwomen= [[Stefka Kostadinova]] {{T&Fcalc|2.09}} (1987) |ORwomen= [[Yelena Slesarenko]] {{T&Fcalc|2.06}} (2004) }} The '''high jump''' is a...
Heraclitus
{{Other people|Heracleitus}} {{Infobox philosopher |region = Western Philosophy |era = [[Ancient philosophy]] |image = Utrecht Moreelse Heraclite.JPG |caption = ''Heraclitus'' by [[Johannes Moreelse]]. The image depicts him as "the weeping philosopher" wringing his hands over the world, and as "the obscure" dressed in dark clothing—both traditional motifs |name = Heraclitus |birth_date = c. 535 BCE |birth_place = [[Ephesus]] |death_date = c. 475 BCE (aged around 60) |death_place = |mai...
Harrison Schmitt
{{Infobox officeholder | birth_name = Harrison Hagan Schmitt | image = Sen Harrison Schmitt.jpg | imagesize = 180px | jr/sr = United States Senator | state = [[New Mexico]] | party = [[Republican Party (United States)|Republican]] | term_start = January 3, 1977 | term_end = January 3, 1983 | preceded = [[Joseph Montoya]] | succeeded = [[Jeff Bingaman]] | birth_date = {{birth date and age|1935|7|3}} | birth_place = [[Santa Rita, New Mexico|Santa Rita]], [[New Mexico]], USA | death_date = | death_...
18 rows|Truncated data

It's important for resource utilization that there be enough partitions of the DataFrame to fully utilize the parallelization available in the cluster. Generally, some multiple of the number of GPUs on your workers (for GPU clusters) or number of cores across the workers in your cluster (for CPU clusters) works well in practice. This helps get more balanced resource utilization across the cluster.

If you do not repartition the data after applying a limit, you may wind up underutilizing your cluster. For example, if only one partition is required to limit the dataset, Spark sends that single partition to one executor.

sample_imbalanced = df.limit(number_articles)
sample = sample_imbalanced.repartition(32).persist()
sample.count()
Out[3]: 1024

Use the transformers pipeline

The following sections use the transformers pipeline for summarization using the sshleifer/distilbart-cnn-12-6 model. The pipeline is used within the pandas_udf applied to the Spark DataFrame.

Pipelines conveniently wrap best practices for certain tasks, bundling together tokenizers and models. They can also help with batching data sent to the GPU, so that you can perform inference on multiple items at a time. Setting the device to 0 causes the pipeline to use the GPU for processing. You can use this setting reliably even if you have multiple GPUs on each machine in your Spark cluster. Spark automatically reassigns GPUs to the workers.

You can also directly load tokenizers and models if needed; you would just need to reference and invoke them directly in the UDF.

from transformers import pipeline
import torch
device = 0 if torch.cuda.is_available() else -1
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
Downloading: 0%| | 0.00/1.58k [00:00<?, ?B/s]
Downloading: 0%| | 0.00/1.63G [00:00<?, ?B/s]
Downloading: 0%| | 0.00/899k [00:00<?, ?B/s]
Downloading: 0%| | 0.00/456k [00:00<?, ?B/s]
Downloading: 0%| | 0.00/1.36M [00:00<?, ?B/s]

Databricks recommends using Pandas UDFs to apply the pipeline. Spark sends batches of data to Pandas UDFs and uses arrow for data conversion. Receiving a batch in the UDF allows you to batch operations in the pipeline. Note that the batch_size in the pipeline is unlikely to be performant with the default batch size that Spark sends the UDF. Databricks recommends trying various batch sizes for the pipeline on your cluster to find the best performance. Read more about pipeline batching in Hugging Face documentation.

You can view GPU utilization on the cluster by navigating to the live cluster metrics, clicking into a particular worker, and viewing the GPU metrics section for that worker.

Wrapping the pipeline with tqdm allows you to view progress of a particular task. Navigate into the task details page and view the stderr logs.

import pandas as pd
from pyspark.sql.functions import pandas_udf
from tqdm.auto import tqdm

@pandas_udf('string')
def summarize_batch_udf(texts: pd.Series) -> pd.Series:
  pipe = tqdm(summarizer(texts.to_list(), truncation=True, batch_size=8), total=len(texts), miniters=10)
  summaries = [summary['summary_text'] for summary in pipe]
  return pd.Series(summaries)

Using the UDF is identical to using other UDFs on Spark. For example, you can use it in a select statement to create a column with the results of the model inference.

spark.sql(f"CREATE SCHEMA IF NOT EXISTS {output_schema}")
summaries = sample.select(sample.title, sample.text, summarize_batch_udf(sample.text).alias("summary"))
summaries.write.saveAsTable(f"{output_schema}.{output_table}", mode="overwrite")
display(spark.sql(f"SELECT * FROM {output_schema}.{output_table} LIMIT 10"))
 
title
text
summary
1
2
3
4
5
6
7
8
9
10
Dolores Ibárruri
{{Multiple issues| {{primary sources|date=October 2011}} {{self-published|date=October 2011}} }} {{Infobox officeholder |honorific-prefix = |name = Dolores Ibárruri |honorific-suffix = |image = Dolores002.jpg |imagesize = |smallimage = <!--If this is specified, "image" should not be.--> |alt = |caption = Dolores Ibárruri in 1978 |order = |office = General Secretary of the [[Communist Party of Spain]] |term_start = March 1942 |term_end = 3 July 1960 |predecessor = [[José...
Dolores Ibárruri was born in Gallarta, Basque Country, Spain, on 9 December 1895. She became a revolutionary militant, joining the Spanish Communist Party (PCE) in 1921. She was elected to the Cortes Generales as a PCE deputy for Asturias in February 1936 during the Second Spanish Republic. After her exile from Spain at the end of the Spanish Civil War, she was appointed General Secretary of the Central Committee of the [[Communist Party of Spain] She was then named honorary president of the PCE...
Consumer Electronics Association
{{advert|date=September 2013}} {{Infobox organization | name = Consumer Electronics Association | logo = [[File:Consumer Electronics Association logo.svg|200px]] | type = [[Trade Organization]] | founded_date = 1924 | founder = | location = 1919 S. Eads St., [[Arlington, VA]] 22202 <!-- this parameter modifies "Headquarters" --> | origins = | key_people = Kathy Gornik; [[chairperson]] | area_served = | focus = | method = | revenue = | endowment = | num_volunteers = | num_employees = | n...
Consumer Electronics Association (CEA) is a standards and trade organization for the consumer electronics industry in the United States. CEA works to influence public policy, holds events such as the International CES and SINOCES, conducts market research, and helps its members and regulators implement technical standards.
Richard Florida
{{BLP sources|date=March 2013}} [[File:Richard Florida - 2006 Out & Equal.jpg|thumb|Richard Florida speaking at the 2006 Out & Equal Workplace Summit.]] '''Richard L. Florida''' (born November 26, 1957, in [[Newark, New Jersey]]) is an [[United States|American]] [[urban studies]] theorist. Florida's focus is on social and economic theory. He is currently a professor and head of the Martin Prosperity Institute at the [[Rotman School of Management]] at the [[University of Toronto]].<ref>{{cite ne...
Richard Florida is best known for his concept of the 'creative class' and its implications for urban regeneration. Florida's theory asserts that metropolitan regions with high concentrations of technology workers, artists, musicians, lesbians and gay men, exhibit a higher level of economic development. He has devised his own ranking systems that rate cities by a "Bohemian index," a "Gay index" and similar criteria.
Madhhab
{{pp-sock|expiry=16 October 2015|small=yes}} {{italic title}} {{Usul al-fiqh}} A '''''{{transl|ar|ALA|madh'hab}}''''' ({{lang-ar|مذهب}} ''{{transl|ar|DIN|maḏhab}}'', {{IPA-ar|ˈmæðhæb|IPA}}, "doctrine"; pl. {{lang|ar|مذاهب}} ''{{transl|ar|DIN|maḏāhib}}'', {{IPA-ar|mæˈðæːhɪb|}}; [[Turkish Language|Turkish]]: '''''mezheb'''''; [[Urdu Language|Urdu]]: مذہب ''{{transl|ar|DIN|'''mazhab'''}}'') is a school of thought within ''[[fiqh]]'' (Islamic [[jurisprudence]]). In the first 150 years of [[Islam]], ...
Mazhab is a school of thought within Islamic jurisprudence. Traditionally there has been four mazhabs followed by the majority of Muslims throughout Islamic history. The main theological schools are three major Sunni schools and three major Shia schools. At one time, there were 130 schools.
New York Americans
{{About|the 1923-1942 NHL team|the 1931-1956 soccer team|New York Americans (soccer)|the 1941 American football team|New York Yankees (1940 AFL)|the [[American Basketball Association|ABA]] team originally known as the New York Americans|Brooklyn Nets}} {{NHL Team |team_name = New York Americans |text_color = #000000 |bg_color = background:#FFFFFF; border-top:#E03A3E 5px solid; border-bottom:#005FA9 5px solid; |logo_image = New York Americans Logo.svg |founded = [[1925–26 NHL season|1925]] |histo...
The New York Americans were a professional ice hockey team based in New York City from 1925 to 1942. They were the third expansion team in the history of the National Hockey League (NHL) The team never won the Stanley Cup, but reached the semifinals twice. The Amerks' franchise was not formally canceled until 1946. The demise of the club marked the beginning of the NHL's Original Six era from 1942 to 1967.
Energy Orchard
{{Refimprove|date=September 2010}} '''Energy Orchard''' were a [[guitar]]-based [[Rock and roll|rock]] [[musical ensemble|band]] of the late 1980s and early 1990s, from [[Belfast]], [[Northern Ireland]]. Fronted by [[Bap Kennedy]] (brother of [[singer-songwriter]] [[Brian Kennedy (singer)|Brian Kennedy]]), their style drew heavily on the influence of [[Van Morrison]] and other [[rhythm and blues]] acts, but incorporated traditional elements of Irish [[folk music]].{{citation needed|date=January ...
''Energy Orchard'' were a Belfast-based band of the late 1980s and early 1990s. Their style drew heavily on the influence of Van Morrison and otherrhythm and blues acts. Despite extensive touring, the breakthrough to mainstream success eluded them. The band having completed their recording contract in 1996 disbanded.
Shaftesbury Abbey
[[Image:Shaftesbury Abbey.jpg|thumb|The Great Seal of Shaftesbury Abbey]] '''Shaftesbury Abbey''' was an abbey that housed [[nun]]s in [[Shaftesbury]], [[Dorset]]. Founded in 888, the abbey was the wealthiest [[Benedictine]] [[nunnery]] in England, a major pilgrimage site, and the town's central focus. The abbey was [[Dissolution of the Monasteries|dissolved]] in 1539 during the [[English Reformation]] by the order of [[Thomas Cromwell]], minister to King [[Henry VIII]]. At the time it was the s...
Shaftesbury Abbey was a Benedictine nunnery in Dorset, England. Founded in 888, the abbey was the wealthiest in England at the time. Many miracles were recorded at the tomb of St Edward, including the healing of lepers and the blind. It was dissolved in 1539 during the English Reformation.
Little pied cormorant
{{taxobox | status = LC | status_system = IUCN3.1 | status_ref = <ref>{{IUCN|id=22696743 |title=''Phalacrocorax melanoleucos'' |assessors=[[BirdLife International]] |version=2013.2 |year=2012 |accessdate=26 November 2013}}</ref> | image = Microcarbo melanoleucos Austins Ferry.jpg | image_width = 250px | image_caption = ''M. melanoleucos'' | regnum = [[Animal]]ia | phylum = [[Chordate|Chordata]] | classis ...
The little pied cormorant is found in Australia, New Zealand, Malaysia, Indonesia and the south-western Pacific. It is usually black above and white below with a yellow bill and small crest. The species is known as the little shag or by the Māori name of kawaupaka in New Zealand.
Jim Finks
{{Infobox NFL player |caption= |number=7 |position=[[Quarterback]] |birth_date={{Birth date|1927|8|31}} |birth_place=[[St. Louis, Missouri]] |death_date={{Death date and age|1994|5|8|1927|8|31}} |death_place=[[Metairie, Louisiana]] |image=JimFinks1955Bowman.jpg |heightft=5 |heightin=11 |weight=180 |debutyear=1951 |debutteam=Pittsburgh Steelers |finalyear=1957 |finalteam=Calgary Stampeders |draftyear=1949 |draftround=12 |draftpick=116 |college=[[University of Tulsa|Tulsa]] |pastteams= * [[Pittsbu...
James Edward Finks was an American football and Canadian football player and coach. Finks played for several years as a defensive back and quarterback for the Pittsburgh Steelers. He went on to coach the Calgary Stampeders of the Canadian Football League. In 1964, Finks became the general manager of the Minnesota Vikings. The Vikings made four Super Bowl appearances and won 11 division championships.
Metropolitan Atlanta Rapid Transit Authority
{{Redirect|MARTA}} {{Use mdy dates|date=June 2013}} {{Infobox Public transit |name = Metropolitan Atlanta Rapid Transit Authority |image = Metropolitan Atlanta Rapid Transit Authority (logo).svg |imagesize = |image2= Marta atlanta skyline.jpg |imagesize2 = |caption2 = |owner = |locale = [[Atlanta]], [[Georgia (U.S. state)|Georgia]] |transit_type = [[Rapid transit]]<br>[[Tram|Streetcar]]<br>[[Bus rapid transit]]<br>[[Bus]] |lines = 4 (rail) <br /> 1 (streetcar) <br /> 92 (bus) |stations = 38 ...
The Metropolitan Atlanta Rapid Transit Authority (MARTA) is the principal rapid transit system in the Atlanta metropolitan area. Formed in 1971 as strictly a bus system, MARTA operates a network of [[bus route]]s linked to a rail system. The average total daily ridership for the system (bus and rail) was 438,900.
10 rows

Use the pipeline on a Pandas DataFrame

Alternatively, you can use the pipeline summarizer on a pandas_udf if you don't want to use Spark.

import pandas as pd
import transformers

model_architecture = "sshleifer/distilbart-cnn-12-6"

summarizer = transformers.pipeline(
    task="summarization", 
    model=transformers.BartForConditionalGeneration.from_pretrained(model_architecture), 
    tokenizer=transformers.BartTokenizerFast.from_pretrained(model_architecture),
    max_length=1024,
    truncation=True
)

def summarize(text):
    summary = summarizer(text)[0]['summary_text']
    return summary

sample_pd = sample.toPandas()
sample_pd["summary"] = sample_pd["text"].apply(summarize)

display(sample_pd)

MLflow wrapping

Storing a pre-trained model as an MLflow model makes it even easier to deploy a model for batch or real-time inference. This also allows model versioning through the Model Registry, and simplifies model loading code for your inference workloads.

The first step is to create a custom model for your pipeline, which encapsulates loading the model, initializing the GPU usage, and inference function.

import transformers
import mlflow


task = "summarization"
architecture = "philschmid/distilbart-cnn-12-6-samsum"
model = transformers.BartForConditionalGeneration.from_pretrained(architecture)
tokenizer = transformers.AutoTokenizer.from_pretrained(architecture)
summarizer = transformers.pipeline(
    task=task,
    tokenizer=tokenizer,
    model=model,
)

artifact_path = "summarizer"

inference_config = {"max_length": 1024, "truncation": True}

with mlflow.start_run() as run:
    model_info = mlflow.transformers.log_model(
        transformers_model=summarizer,
        artifact_path=artifact_path,
        registered_model_name="wikipedia-summarizer",
        input_example="Hi there!",
        inference_config=inference_config,
    )

MLflow scoring

MLflow provides an easy interface to load any logged or registered model into a spark UDF. You can look up a model URI from the Model Registry or logged experiment run UI. The following shows how to use pyfunc.spark_udf to apply inference transformation to the Spark DataFrame.

model_uri = model_info.model_uri

# Load model as a Spark UDF. Override result_type if the model does not return double values.
loaded_model = mlflow.pyfunc.spark_udf(spark, model_uri=model_uri, result_type='string')

summaries = sample.select(sample.title, sample.text, loaded_model(sample.text).alias("summary"))
summaries.write.saveAsTable(f"{output_schema}.{output_table}", mode="overwrite")
2023/01/25 23:40:47 WARNING mlflow.pyfunc: Calling `spark_udf()` with `env_manager="local"` does not recreate the same environment that was used during training, which may lead to errors or inaccurate predictions. We recommend specifying `env_manager="conda"`, which automatically recreates the environment that was used to train the model and performs inference in the recreated environment. 2023/01/25 23:40:48 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'

Cleanup

Remove the output table this notebook writes results to.

spark.sql(f"DROP TABLE {output_schema}.{output_table}")
Out[11]: DataFrame[]