Gemini入门系列(2)——本地文件处理

  Gemini模型的重要特点是multimodal多模态,可以对图片视频语音等处理。将文件传给Gemini前需要让Gemini知道去哪里找到文件,通常可以把文件放在本地或者线上。以下示例展示本地文件的使用方式。

GCP Console操作

  在Console的Gemini界面,可以看到插入媒体的按钮:

  如果媒体文件不大于7MB,可以直接通过本地上传。现在通过本地文件上传分析一张图片:

  无需任何处理,Gemini直接对上传文件进行了分析。

通过Code操作

Gemini代码浅析

  上一个示例的代码,可以分成简单几个部分:
  1、库的导入,这里导入vertexai的库、Gemini模型的库,用作初始化环境设置及Gemini调用;导入google认证库的service account模块,用作鉴权;还导入base64库,但这段示例中没有使用

1
2
3
4
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, Part, SafetySetting
from google.oauth2 import service_account #导入google认证库的service account模块

  2、设置认证,通过service account创建认证变量

1
2
3
cred = service_account.Credentials.from_service_account_file(
'/home/gcpvm/ai-demo-440003-7b8cf6bf07d5.json'
)

  3、如下代码配置了Gemini的运行参数,并作为全局变量传递给函数,可以根据需求修改,但目前输出token最大8192

1
2
3
4
5
generation_config = {
"max_output_tokens": 8192,
"temperature": 1,
"top_p": 0.95,
}

  4、这里是安全过滤设置,有四个分类:仇恨言论、微信内容、露骨色情内容、骚扰内容;可以分别对不同类别设置过滤等级,默认OFF,有4个等级供设置;同样作为全局变量传递给函数
Xnip Helper 2024-11-01 17.59.18

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
safety_settings = [
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=SafetySetting.HarmBlockThreshold.blo
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
]

  5、如下这段函数即Gemini真正运行的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def generate():
vertexai.init(project="ai-demo-440003", location="us-central1", credentials=cred) #在初始化环境配置credential
model = GenerativeModel(
"gemini-1.5-flash-002",
)
responses = model.generate_content(
["""Please tell me how to lear python"""],
generation_config=generation_config,
safety_settings=safety_settings,
stream=True,
)

for response in responses:
print(response.text, end="")

  其中:

  • vertexai.init初始化环境,这里设置了:
    • project为项目名称,需要使用项目id
    • location运行Gemini的region
    • credentials指定认证鉴权
  • model设置Gemini模型的版本,例如flash或pro,1.0或1.5,001或002等,根据需要选择
  • model.generate_content运行模型,其中最重要的就是[]中的prompt,后边几个变量传入相应设置,并将结果输出给变量response
  • 打印response
      知道代码每段的作用,下一步将检查如何使用本地媒体文件。

在代码中使用本地媒体文件

  点击获得代码,可以看到使用本地文件的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, Part, SafetySetting


def generate():
vertexai.init(project="ai-demo-440003", location="us-central1")
model = GenerativeModel(
"gemini-1.5-flash-002",
)
responses = model.generate_content(
[image1, """Please describe this picture."""],
generation_config=generation_config,
safety_settings=safety_settings,
stream=True,
)

for response in responses:
print(response.text, end="")

image1 = Part.from_data(
mime_type="image/webp",
data=base64.b64decode(""""""),
)

generation_config = {
"max_output_tokens": 8192,
"temperature": 1,
"top_p": 0.95,
}

safety_settings = [
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
]

generate()

  现在运行这段代码(不要忘了加上认证),可正常的到结果

  对比这段代码和上一个示例代码,可以发现有两个主要的区别:

  1. 多了如下一个变量:

    1
    2
    3
    4
    5
    image1 = Part.from_data(
    mime_type="image/webp",
    data=base64.b64decode("""UklGRmacAABXRUJQVlA4IFqcAAAQtAOdASroA/
    ...""")

      这里将本地的图片做了base64编码,然后在代码里首先用base64做decode,然后通过Part这个类赋值给变量image1作为prompt的组成部分

  2. 在prompt里使用image1:

    1
    2
    3
    4
    5
    6
    responses = model.generate_content(
    [image1, """Please describe this picture."""],
    generation_config=generation_config,
    safety_settings=safety_settings,
    stream=True,
    )

      从以上分析可以看出,使用本地媒体文件最直接的方法,就是先做base64编码,然后再将编码放入code里使用。但这种方法需要先离线做base64,再把生成的字符串放入code,繁琐且容易出错。
      为了简化流程及规范化代码,我们可以直接在code里处理base64编码,以上代码可以修改为如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    import base64
    import vertexai
    from vertexai.generative_models import GenerativeModel, Part, SafetySetting
    from google.oauth2 import service_account

    cred = service_account.Credentials.from_service_account_file(
    '/home/gcpvm/ai-demo-440003-7b8cf6bf07d5.json'
    )


    def generate():
    vertexai.init(project="ai-demo-440003", location="us-central1", credentials=cred)
    model = GenerativeModel(
    "gemini-1.5-flash-002",
    )
    responses = model.generate_content(
    [image1, """Please describe this picture."""],
    generation_config=generation_config,
    safety_settings=safety_settings,
    stream=True,
    )

    for response in responses:
    print(response.text, end="")

    # 对本地文件base64编码
    with open('/home/gcpvm/1.webp','rb') as image:
    imageEncode = base64.b64encode(image.read())

    image1 = Part.from_data(
    mime_type="image/webp",
    data=base64.b64decode(imageEncode),
    )

    generation_config = {
    "max_output_tokens": 8192,
    "temperature": 1,
    "top_p": 0.95,
    }

    safety_settings = [
    SafetySetting(
    category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
    category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
    category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    SafetySetting(
    category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold=SafetySetting.HarmBlockThreshold.OFF
    ),
    ]

    generate()

      改写后的代码里,我们直接将本地文件编码和解码,无需预操作,并且代码更简单。

非base64方法

  对于图片类文件,Part可以直接处理,以上代码可以修改如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, Part, SafetySetting, Image
from google.oauth2 import service_account

cred = service_account.Credentials.from_service_account_file(
'/home/gcpvm/ai-demo-440003-7b8cf6bf07d5.json'
)


def generate():
vertexai.init(project="ai-demo-440003", location="us-central1", credentials=cred)
model = GenerativeModel(
"gemini-1.5-flash-002",
)
responses = model.generate_content(
[image1, """Please describe this picture."""],
generation_config=generation_config,
safety_settings=safety_settings,
stream=True,
)

for response in responses:
print(response.text, end="")


image1 = Part.from_image(
Image.load_from_file('/home/gcpvm/1.webp')
)


generation_config = {
"max_output_tokens": 8192,
"temperature": 1,
"top_p": 0.95,
}

safety_settings = [
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
SafetySetting(
category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold=SafetySetting.HarmBlockThreshold.OFF
),
]

generate()


  此处将本地图片文件的base64编解码省略,直接用Part执行图片的加载,进一步简化操作:

1
2
3
image1 = Part.from_image(
Image.load_from_file('/home/gcpvm/1.webp')
)

小结

  作为multimodal的模型,处理文件是Gemini的重要功能,对于7MB一下的文件,无需上传即可直接调用本地文件处理,特别在一些测试环境,无需上传文件,非常方便。